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Reading Speed and Comprehension as a Function of Typography 


Alvin J. North and L. B. Jenkins 
Southern Methodist University 


Recently a student of journalism, R. B. 
Andrews (1), proposed a style of typography 
called “square span” in which the material is 
arranged in double-line blocks, as in the 
following: 

This is 
an example 


of the 
square span 

According to Andrews, square span should 
aid reading speed and comprehension by 
effectively utilizing both the horizontal and 
vertical visual span and by grouping the 
words into thought units. In Andrews’ study 
12 subjects were tested. Six of these were 
timed on each of five pages of square span 
text. A week later they were timed on the 
same five pages in standard form. The other 
six subjects were timed over the same mate- 
rials, but in the reverse order. Of the 60 
pairs of timings, 32 were in favor of square 
span, 24 in favor of standard form, and 4 
indifferent. Although indicative, these results 
are not conclusive, and further research is 
required. ' 

Some unpublished preliminary work by the 
junior author seems relevant. When material 
in square span form was tachistoscopically 
presented, perception was found to depend 
more on the familiarity and unity of the 
thought content than on the sheer amount of 
material. Perhaps, then, the main advantage, 
if any, of square span lies in the grouping of 
words into thought units. 

In general, the typographical arrangement 
of material may provide cues for the appro- 
priate organization of the thought content. 
In addition to square span, a second style of 
typography utilizing this principle was devised 
and was called “spaced unit.” 


style of 
presentation 


This is an example of the spaced unit style 
of presentation 


Note that the spaced unit style is like the 
square span style in grouping the words 
according to thought units, but more closely 
approximates standard typography in its 
unilinear arrangement. 

The purpose of the present study, then, was 
to compare square span, spaced unit, and 
standard typography in terms of reading 
speed and comprehension. A secondary pur- 
pose was to determine the influence of two 
limited degrees of practice with a specified 
style of typography. 


: Method 


The subjects were 180 university freshmen 
enrolled in two large sections of a social 
science course. The materials, which were 
mimeographed and then assembled in booklet 
form, consisted of three articles selected from 
a popular magazine (2, 5, 6) together with an 
objective test. Articles 1 and 2 were used for , 
practice reading, and Aiticle 3 was used as the © 
test article; i.e., the one to which the reading 
test pertained. According to the revised 
Flesch formula (4), this test article had a 
“reading ease” score of 36.9 and a “human 
interest” score of 11.6. These scores indicate 
a moderately difficult style. The reading 
test consisted of 23 questions over the factual 
content of the test Article 3. 

The design of the experiment is shown in 
Table 1. One experimental variable was style 
of typography of the test article. The subjects 
were divided at random into three groups. 
Group I read the test article in square span 
form; Group II, in spaced unit form; and 
Group III, in standard form. In breaking 
the material into smaller units for use of 
square span and spaced unit styles, an attempt 
was made to group words into thought units. 
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Table 1 
Design of the Experiment 











Condition 


Style of Typography 
on Practice Articles 





Article 1 


Article 2 


Style of Typography 
on Test Article 


Article 3 





ins 
IB 


IIA 
IIB 


IIIA 
ITIB 


Square span 
Standard 


Spaced unit 
Standard 


Standard | 
Square span 


Square span 
Square span 


Spaced unit 
Spaced unit 


Standard 
Standard 


Square span 
Square span 
Spaced unit 
Spaced unit 


Standard 
Standard 











The text was broken down into the same word 
groups in square span and spaced unit styles. 
The second experimental variable was the 
amount of prior practice on whichever style 
of typography was used in the test article. 
Each of the three major groups was divided at 
random into two subgroups. Subgroup A had 
four minutes of practice (on Article 2) and 
Subgroup B had eighi minutes of practice (on 
Articles 1 and 2) with the style used in the 
test article. Thus it was possible to determine 
whether the advantage, if any, of a given style 
of typography was a function of the amount of 
practice with it, at least within the limited 
range studied. It is realized, of course, that 
subjects were highly practiced with the stand- 
ard style even before the experiment began. 
The procedure was as follows. Each subject 
was given a mimeographed booklet containing 
the materials and instructions. All subjects 


began simultaneously and read Article 1 
for four minutes until the signal “stop” was 
given. Then Article 2 was read for four 
minutes. Next test Article 3 was read for two 
minutes. After the signal to stop had been 
given, the subjects were told to mark the 
place in Article 3 which they had reached in 
their reading. Finally the subjects turned to 
the objective test and answered the questions 
over test Article 3. 


Results 


The data were analyzed with respect to 
three measures of performance: (1) reading 
Speed, as measured by the number of words read 
in two minutes on the test article (indicated by 
place in text marked by subject), (2) com- 
prehension, as measured by the number of 
questions answered correctly in the test, and 
(3) accuracy, as measured by the per cent of 


Table 2 
Means of Reading Speed, Comprehension, and Accuracy Scores* 











Sub- 
Condition group 


Reading Speed Comprehension Accuracy 
(No. of Words) (Items Correct) (Per Cent) 





Square span IA 
IB 


Spaced unit IIA 
IIB 


Standard TIA 
IIIB 


434.3 7.5 87.1 
416.3 7.3 82.8 


491.5 8.5 87.1 
495.0 8.6 86.9 


438.9 7.3 85.6 
450.5 7.8 84.9 








* Unfortunately S.D.’s were lost in moving but original work sheets showed S.D.’s of the groups and sub- 
— were sufficiently similar to satisfy the assumption of homogeneity of variance, which is involved in analysis 
of variance. 
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Table 3 
Analysis of Variance of Reading Speed Scores 


Table 5 
Analysis of Variance of Accuracy Scores 





Source of 
Variation SS DF 


Variance F 
Estimate Ratio 


Variance F 
Estimate Ratio 


Source of 
Variation SS DF 





Styles 155,700 2 
Practice 80 1 
Styles X Practice 7,103 2 
Error 1,488,263 179 


77,850.0 9.36* 
80.0 01 
3,551.5 43 
8,314.3 


Styles 152 2 
Practice 129 1 
Styles X Practice 164 2 
Error 19,557 179 


76.0 70 
129.0 1.18 

82.0 75 
109.3 





* Significant beyond 1% level of confidence. _Differ- 
ence between spaced unit and square means is significant 
at 1% level; between spaced unit and standard, 1% 
level; between square span and standard, non-significant. 


correct answers to questions pertaining only 
to that portion of the test article marked by 
a subject as read. 

The mean performance of each group and 
subgroup in terms of each of these measures 
is shown in Table 2. The apparent differences 
were then tested for statistical significance by 
the analysis of variance method. The results 
of the analysis of variance of reading speed 
scores appear in Table 3. Significantly more 
words were read with spaced unit typography 
than with either square span or standard 
typography. There were no significant differ- 
ences attributable to amount of practice with 
a given style. 

The analysis of variance of comprehension 
scores is reported in Table 4. The results 
parallel those for reading speed,—significantly 
more questions were answered correctly by the 
spaced unit group than by either the square 
span or standard groups. And again the 
amount of practice with a style resulted in no 
significant differences. 


Table 4 


Analysis of Variance of Comprehension Scores 








Source of Variance F 

Variation SS DF Estimate Ratio 
Styles 44 2 22.00 3.81* 
Practice 2 1 2.00 35 
Styles X Practice 5 2 2.50 43 
Error 1,032 179 5.77 








* Significant at 5% level of confidence. Difference 
between spaced unit and square span means is signifi- 
cant; between spaced unit and standard, significant; 
between square span and standard, non-significant. 


The analysis of variance of accuracy scores 
is shown in Table 5. No significant differences 
attributable to style of typography or amount 
of prior practice were found. There was no 
evidence that the faster reading of the spaced 
unit group was associated with lower accuracy. 


Discussion 


In terms of both reading speed and compre- 
hension measures of performance, spaced unit 
typography was superior to either square span 
or standard typography. Although the 
amount of superiority was not marked, it 
nevertheless was statistically significant and 
appeared even after but a minimal amount of 
practice. 

How may this advantage of spaced unit 
typography be explained? Let us assume that 
reading, the perception of printed text, involves 
the organization of the material into a mean- 
ingful structure. This point of view is similar 
to that of Dolch (3). This process of organiza- 
tion of the thought depends on many factors 
orcues. The hypothesis is offered that spaced 
unit typography provides cues for the organiza- 
tion of the thought, and hence aids reading. 
As such, spaced unit style is functionally 
equivalent to punctuation and grammatical 
elements such as prepositions and conjunctions. 
It might be said that the present study demon- 
strates that our present punctuational and 
grammatical cues may be supplemented to an 
advantage when the material is relatively 
complex. 

If the hypothesis that typographical arrange- 
ment may provide cues for the organization of 
thought is correct, why was square span 
inferior to spaced unit and possibly even 
standard typography? The probable answer 
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is that square span is a radical departure from 
standard typography and thus interferes with 
established reading habits. Original or inten- 
sive training with square span might have 
other results. 

The hypothesis that typographical and 
other cues for the organization of thought aid 
reading suggests a number of lines of further 
research. The value of the spaced unit style 
could be studied as a function of the difficulty 
of the material. Presumably more: auxiliary 
cues are required for the perception of difficult 
and unfamiliar material than for easy and 
familiar material. The material of the present 
study was moderately difficult; with highly 
technical material the advantages of spaced 
unit typography might be more dramatic. 

Another line of research would be to examine 
a variety of styles of typography in order to 
determine which furnishes optimal cues for 
the organization of thought. Wider separa- 
tions between groups of words or a system of 
slashes or dots (as used in symbolic logic 
notation) are possibilities. In addition, it 
would be desirable to work with various age 
groups and with subjects who had been given 


intensive training with such a set of cues. 
Finally, fundamental research is required to 
determine the appropriate meaningful struc- 


turing of the text. Presumably textual 
material does not tonsist of a string of co- 
ordinate units, but has a complex hierarchical 
structure. When this structure is_ better 
understood, typographical and other cues may 
be applied with greater objectivity and 
efficiency. 


A. J. North and L. B. Jenkins 


Summary 


In a factorial design experiment, three 
styles of typography were compared. In 
terms of reading speed and comprehension 
measures, spaced unit was superior to square 
span or standard typography. This advantage 
in reading speed was not accompanied by a 
loss in accuracy of retention. No differences 
attributable to the two limited degrees of 
practice were found. 

The hypothesis is offered that spaced unit 
typography facilitates reading by providing 
auxilary cues for the organization of the 
thought. Further research with materials 
of various levels of difficulty, other typo- 
graphical cues, and trained subjects is recom- 
mended. The study of the organization of 
thought in reading is prerequisite to objective 
and efficient utilization of typographical and 
other cues. 

Finally, the spaced unit style as used in the 
present study may have immediate practical 
application, perhaps in academic and military 
instructional materials. 


Received September 20, 1950. 
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New Ideas in Industrial Psychology * 


Edwin E. Ghiselli 
University of California, Berkeley 


In a review of any kind it is clearly necessary 
to do some selection of material. This is 
particularly true of the present paper since 
the number of concepts I plan to discuss is 
quite small, being only six. But the mere 
process of selection itself may be revealing. 
Thus I am asked to indicate and describe 
some of the current concepts in the field of 
industrial psychology which appear to me to 
be important. Some information concerning 
trends will be obtained from the particular 
concepts that I have chosen to present, but 
in addition, I think, further significant data 
will be given when the nature of these concepts 
is considered in light of my particular biases. 
That which is perceived as being important 
becomes more meaningful when one knows 
something about the nature of the perceiver. 

It has been the custom to classify psychol- 
ogists into James’ two types, the tough minded 
and the tender minded. Such a classification 
obviously is but a caricature and bears no 
relationship to reality. Nonetheless, we can 
say that the tough minded view themselves 
as being rigorous in thinking and precise in 
method, and perceive the tender minded as 
being vague and loose. The tender minded, 
on the other hand, consider that they are 
concerned with the important and more 
global dynamics and perceive the tough minded 
as dealing with dinkle and sterile statics. 

If one is willing to adhere to this dichotomy 
of tough minded-tender minded, it becomes 
relatively simple to classify not only psychol- 
ogists but also their works. Of the six 
formulations I propose to discuss with you, 
four would be classified as tender minded and 
only two as tough minded. This skewed 
distribution acquires additional significance 
when I point out to you that I classify myself 
as tough minded; that is, one who perceives 
himself as a careful investigator dealing with 

* Read at the Symposium of the Division of General 


Psychology on Conceptual Trends, Fifty-Eighth Annual 
Meeting of the American Psychological Association. 


exact realities and who, in turn, is perceived 
as being engulfed in uninspired trivialities. 

It appears to me that the particular nature 
of the concepts chosen for review taken to- 
gether with the particular nature of the 
reviewer constitutes an important fact con- 
cerning trends in industrial psychology. Al- 
though the data fall short of approved statis- 
tical significance, they are nonetheless of 
psychological significance, pointing up. the 
change occurring in the industrial psychologist 
—in the past characteristically tough minded 
and a confirmed bigot about it all. 

Another characteristic of the concepts that 
I shall consider is that they are the result of 
group thinking. The ideas may have been 
initiated by one individual, but their elabora- 
tion and development have been brought 
about by cooperative effort. However, for 
purposes of simplification of presentation, I 
shall label each with the names of only one 
or two persons, so let me apologize collectively 
to those who go unmentioned. 

The first of the tender minded concepts I 
should like to consider is that of Lewin and 
his colleagues concerning the motivation of 
workers (9, 1, 10). Studies of motivational 
factors in the productivity of industrial 
workers by and large have been either 
“naturalistic” type observations or restricted 
single variable experiments on incentives. 
Thus the concern has been with descriptions 
of the behavior of the worker in his native 
habitat or the push button system in the 
mechanical man. Such approaches cannot 
be said to be particularly helpful in understand- 
ing the causes of production changes or in 
formulating hypotheses fruitful of further 
research. Just before World War II, Lewin’s 
broad interests and enthusiasm led him to 
consider various problems in the field of 
industrial psychology. Among the more excit- 
ing contributions of Lewin and his colleagues 
have been their investigations into, and con- 
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ceptual formulations of, the motivational 
dynamics of production. 

Rather than seeking verifications of the 
notions of field dynamics in the industrial 
situation, let us adopt them for the moment as 
a frame of reference. Suppose we conceive 
of production level as the resultant of opposing 
forces. There are forces pushing it upward, 
such as the desire among piece rate workers 
for increased earnings, and forces tending to 
depress it, such as the strain of hard work. 
Levels of production, then, are the result of a 
dynamic balance and can be changed either 
by adding or strengthening forces operating 
in the desired direction or by reducing forces 
operating in the opposite direction. 

Let us examine these matters in terms of 
results obtained from studies with power 
sewing machine operators who are transferred 
from one job to another. When one of these 
operators is transferred to a new job, the 
relearning period is much longer than the 
initial learning period of a novice. At first 
glance this might seem to be the result of 
proactive inhibition. Three lines of evidence 


dispel this view. First of all, the transferred 
workers rarely complain of wanting to do the 


job the old way. Secondly, time and motion 
studies show few false motions after the first 
week of change. Finally, those who were 
high producers on the old job relearn at the 
same rate as those who were low producers. 
Thus skill appears to be at best a very minor 
factor in the slow relearning. 

On the other hand, the findings pretty clearly 
indicate that the lowered production can be 
attributed to frustration. Management’s pro- 
duction quota was accepted by the workers as 
their goal. This standard became the level 
of aspiration according to which the individual 
and the group gauged success or failure. The 
acceptance of this standard is indicated by 
the fact that the large proportion of non- 
transferred operators had production levels 
at this standard. On being transferred the 
worker ordinariiy finds that his production is 
low. Even though he receives extra financial 
remuneration for this, his performance is 
markedly below his level of aspiration. His 
frustration at this point is revealed by his 
feelings of failure and his resentment against 


Edwin E. Ghiselli 


management. Just after transfer, labor turn- 
over is exceedingly high. 

I have, of course, given a very meager 
description of the work and thinking of the 
Lewin group. However, it should be sufficient 
to see how the concepts can set the stage for 
further developments, point out fruitful 
avenues of research, and bring together in a 
meaningful way knowledge that has been 
accumulated concerning motivational problems 
of industrial workers. 

By way of example, we can consider the 
so-called human relations training programs of 
which so much is made today. What should 
they accomplish and how should they operate? 
Since increase in tensions, according to the 
notions of field dynamics, leads to aggressive- 
ness and a reduction in constructive activity, 
it follows that in training there should be a 
reduction in forces operating to depress 
production. Furthermore, since distant goals 
will be ineffective in producing desired results, 
levels of aspiration should be set realistically. 
Thus the Lewin group sought to improve the 
situation with respect to transferred workers 
by utilizing, group discussion techniques. 
Such procedures were intended to reduce the 
forces operating to keep production down 
rather than adding new forces. Furthermore, 
substitute goals were set up for transferred 
workers, goals far below the accepted standard 
of production, but within ready reach of 
achievement. As these were achieved, suc- 
cessively higher goals were introduced. By 
such means relearning was accomplished much 
more readily, and labor turnover was signif- 
icantly reduced. 

This discussion of motivation leads us 
directly to the work of the Likert and Katz 
group on factors in worker morale, the second 
in the tender minded series (14, 8). Their 
objectives have been the determination of the 
conditions for group motivation and produc- 
tivity, and for individual satisfaction. While 
a number of specific researches have been 
accomplished with respect to these objectives, 
one of the most interesting aspects of their 
work is the stress put upon delineations of 
variables. 

The Likert and Katz approach appears to 
start with the establishment of variabies 
through rational analysis. This is followed 
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by field studies designed to get the feel-of 
things. While exploratory in a larger sense, 
these studies nonetheless have been most 
productive of useful findings. On the basis of 
experience gained, original thinking is re- 
viewed, definitions revised, and hypotheses 
set up for further study. The intent is not 
simply to discover the degrees of relationships 
existing between variables selected for study, 
but rather their modes of interaction. 

By way if illustration, let us follow certain 
phases of a study dealing with one of the 
major objectives, namely, conditions of satis- 
faction. Work satisfaction was analyzed 
logically and appeared to have four aspects— 
intrinsic job satisfaction, pride in work groups, 
satisfaction with the company, and financial 
and status satisfaction. Now if these are to 
be considered as truly separable areas, then 
various measures of any one should be more 
highly correlated than measures of different 
ones. Groups of workers had been inter- 
viewed, and appropriate questions from the 
interview schedule were selected on the basis 
of the definitions of work satisfaction. 
Responses to these questions were examined 
and indices of each of the four aspects devel- 
oped. These indices showed quite adequate 
internal consistency and external independ- 
ence. 

To follow the type of approach character- 
istic of the Likert and Katz group, let us 
consider their further development of one of 
these phases of work satisfaction, intrinsic 
job satisfaction. As ochers have found, in- 
trinsic job satisfaction was discovered to be 
related to the content of the work—those 
workers engaged in routine jobs requiring 
little skill showing a much lower degree of 
intrinsic job satisfaction than those on varied 
jobs requiring high skill. 

One must now seek answers to the questions 
concerning the extent to which the individual 
can adapt to the content of the work and the 
extent to which it will be necessary to change 
the work content to adapt to the individual’s 
needs. These questions are far more striking 
than may appear to you. Customarily we 
assume that the content of the work is un- 
changeable. The motion study man may 
change the method but not the essential 
nature of the job. The psychologist has been 
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content to accept the job as it stands and to 
conceive of his contribution as developing 
techniques for the selection of those persons 
who can do the defined work. Thus Likert, 
Katz, and their co-workers challenge manage- 
ment to explore individual needs further and 
to discover ways of changing jobs in these 
terms rather than just in terms of methods 
of work, duties, and the like. 

One of the many interesting problems with 
which this group has been concerned is worker 
productivity and morale in relation to the 
nature of supervision. Among the questions 
they asked themselves was what types of 
supervisory practices differentiate high from 
low production groups. Again the method was 
the personal interview using open ended 
questions. Comparisons of the responses both 
of supervisors and of subordinates of high and 
low production groups, then, could be made. 

I cannot here review all of the findings of 
these studies. Suffice it to say that they are 
in agreement with those of others. However, 
the Likert and Katz group are able to make 
more specific and better documented con- 
clusions. In general it appears that higher 
levels of production are not achieved by stress 
upon production itself, but rather through 
procedures designed to stimulate ego motiva- 
tions of self-determination, self-expression, 
and personal worth. Again we are provided 
with a framework which integrates and gives 
meaning to human relations training programs. 

The third set of ideas in the tender minded 
series is that of Shartle and his group on 
leadership (12, 11, 13). While still in a rela- 
tively early stage, their work nonetheless has 
had a significant effect upon thinking in the 
field of industrial psychology. Their program 
of research is a long term one and so far they 
have been concerned principally with the 
formulation of concepts and the development 
of methodology. Preliminary research find- 
ings, while few, have been important. 

Their initial step was to attempt definitions 
of the terms leadership and organization. For 
research purposes it was believed that leader- 
ship should be considered in terms of action. 
By virtue of their interaction with other 
persons who are participating in a particular 
goal-oriented activity, leadership resides in 
certain individuals. It exists only insofar 
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as members of an organization are differen- 
tiated as to the influence they exert upon that 
organization. Leadership, then, becomes 
defined as the process of influencing the 
activities of an organization in its goal setting 
and goal achievement. The value of this 
definition for research purposes is that it 
imposes no restrictions in terms of type of 
leadership or type of organization. 

An organization is conceived of as a special 
kind of group—a group which has the task of 
achieving a common goal and in which the 
members play different roles and are differen- 
tiated in terms of their responsibilities for the 
common task. Measures of structure, author- 
ity, and of leadership, then, are simply meas- 
ures of certain aspects of organization. The 
Shartle group feels that these definitions are 
advantageous first because they integrate 
leadership with the basic variables which 
describe a group and remove it from the 
broad and loosely defined realm of social 
interaction, and second because they suggest 
the development of methodology for studying 
leadership. 

Through job analysis procedures the types of 
activities engaged ‘in by persons in high 
organizational positions have been studied. 
A description of a particular leader’s behavior 
can be obtained by determining the proportion 
of time he spends in each. Such descriptions 
furnish data concerning the roles adopted by 
various members of the organization, types of 
leadership exercised, and perceived organiza- 
tional goals. Information concerning who 
spends how much time with whom permits the 
construction of a sociogram. Such sociograms 
have been found to differ significantly from the 
formal organizational structure. Indices have 
been constructed concerning the responsibility, 
authority, and delegation which bear interest- 
ing relationships to sociometric ratings, 
position in the organization, and work patterns. 

The attempts of Shartle’s group to define 
terms and develop concepts with respect to 
leadership and organizational groups will, I 
think, be of great help in giving new meaning 
and integrating for the industrial psychologist 
the rich researches in group dynamics. Here- 
tofore these researches have not fitted in as 
they should in industrial psychology. Shartle’s 
work undoubtedly will help bridge the gap. 
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The last of the tender minded concepts is 
that of Haire with respect to industrial peace 
(5, 6). While some psychologists have ev- 
idenced interest in the topic which is now 
labeled industrial peace, their explorations 
have been most tentative. Thus it appears 
to me that the pioneering work of Haire in 
this area assumes particular significance. 
Historically, industrial peace has been a 
problem considered to fall principally within 
the scope of the economist. The economist’s 
approach has been institutional. He has 
concerned himself with external factors and 
has not seen how the problems become 
internalized by management and labor. For 
the economist, industrial peace is said to exist 
when there is no industrial strife. Much of 
this type of peace is fake peace since one or the 
other of the two parties may not resort to 
violent actions simply because it feels that such 
action will severely worsen an already bad 
situation. Thus from the institutional ap- 
proach peace will be considered to exist, but 
phenomenologically there is not peace. 

As a consequence, Haire has felt that the 
most fruitful attack on problems of industrial 
peace is the phenomenological. He starts 
out with the hypothesis that the goodness or 
badness of the relationship existing between a 
particular management and a particular labor 
group is a function of the compatability of 
their role perceptions. Peace occurs when 
each party correctly perceives and accepts the 
role of the other, and in turn feels that the 
role it believes it should play is correctly 
perceived and accepted. Thus a union may 
refuse to settle with management because it 
considers that management is thrusting upon 
it a subordinate role that it should not be 
forced to adopt. 

The tasks for the industrial psychologist, as 
Haire sees it, are two—first, to measure and 
describe the role perceptions of management 
and labor, and secondly, to discover how 
various role perceptions interact in terms of 
resultant industrial peace or disagreement. 
For the present Haire is concerning himself 
with the first of these tasks, seeking to discover 
the kinds of roles that are adopted and the 
kinds of roles each party can see for itself and 
for the other party. 

In studying the role perceptions of manage- 
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ment and labor, Haire has utilized various 
techniques, many of which are quite novel. 
Let me describe just two.’ Content analyses 
have been made of the utterances of representa- 
tives of labor and of management. From the 
transcripts of collective bargaining negotiations 
notations have been made of such things as 
who calls whom by first name and who calls 
whom mister, and the kinds of attacks and 
defenses made. Thus actual verbal behavior is 
used to define role perceptions. 

Another device has been the use of indirect, 
or, if you wish, projective methods. From 
a list of characteristics of a hypothetical 
person, the individual is to construct a verbal 
description of him. The characteristics are 
such items as plays jokes, reads the newspaper, 
and goes to union meetings. The results are 
most interesting in indicating the ways in 
which people protect their role perceptions in 
the face of conflicting evidence from the 
environment. A working man who is char- 
acterized among other things as being intel- 
ligent poses a problem. Obviously he cannot 
be both a working man and intelligent. So 
what we do is wrap up the characteristic of 


intelligence somehow or dispose of it by other 


means. We will say that the man possesses a 
kind of low animal cunning, or that he has no 
ambition, or that he is not a working man 
anyway but that he is a foreman. 

Haire’s conceptualizations of factors in 
industrial peace are clearly psychological. 
They serve as a framework by means of which 
past studies can be integrated and suggest 
innumerable exciting research problems. As 
his work proceeds we shall draw closer to 
effective solutions to very real socio-industrial 
problems. 

We now can turn to the first of the concepts 
I am classifying as tough minded. Without 
doubt the area of industrial psychology which 
has stimulated least original thinking is that 
of job analysis. About the only innovations 
in this area have been of the order of changing 
the color of the paper on which the job schedule 
is printed. It is therefore apparent that any 
fresh approach such as Flanagan’s critical 
requirements in work evaluation will provide a 
revitalization of a most important area (2, 3, 
4). 

If one examines the requirements and duties 
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for a job that has been well analyzed, the 
resultant picture appears to be quite sensible. 
It is said, for example, that the primary duty 
of a bottle-capping machine operator is to 
“insert the bottle into the capping machine, 
depress the lever thus capping the bottle, 
remove the bottle, and examine for the 
security of the cap.” The primary duty of a 
cashier will be given as “check sales slip, 
make appropriate change.” However mean- 
ingful such job descriptions seem, they are 
not likely to turn out to be either realistic 
or pertinent. Thus one finds that the bottle- 
capping machine operator is commended by 
his superior not because he caps many bottles, 
but because he seldom, if ever, permits his 
machine to run out of caps and thus cause a 
stoppage on the production line. Similarly, 
the cashier is less likely to be fired for being 
slow in making change than for pinching from 
the cash box. 

The important activities connected with the 
job, therefore, are not so much the prescribed 
routine, but rather certain critical aspects. 
In other words, the formally stated job 
activities tend to form a point of indifference 
and those activities that are, as it were, over 
and above the call of ordinary duty, or under 
and below it, are the important ones. 

While I may be stretching the situation 
somewhat, in general it was from considerations 
such as these that Flanagan and his group 
were led to reject standard practices in job 
analysis and to devote taeir efforts to the 
discovery of critical incidents involved in 
job performance. Their approach puts much 
heavier emphasis on the actual behavior of the 
worker, and seeks to obtain representative 
samples of those kinds of behavior which are 
crucial in the sense that they truly define 
outstandingly effective or definitely unsatis- 
factory performance. 

Through the use of the critical requirements 
approach Flanagan’s group has developed job 
descriptions which are far more realistic than 
those which result from the use of traditional 
methods. In addition, it furnishes a very 
sound basis for the construction of criterion 
measures and rating procedures. By focusing 
attention on the truly critical phases of jobs, 
it may indicate that for many occupations 
formal requirements are not only unrealistic 
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but also far too high. This would mean that 
many more persons could be considered to 
meet minimum qualifications, and the result 
would have a significant effect with respect 
to manpower utilization. 

The second, and last, tough minded concept 
that I should like to discuss is Robert Thorn- 
dike’s formulation of personnel classification 
(16, 17). In spite of the fact that matters 
concerning personnel placement have grown to 
be extremely complicated, we view the problem 
of the appraisal of vocational fitness today 
almost exactly as we did at the beginning of 
the century. We operate under the pre- 
supposition that all that is necessary is to 
predict each individual’s relative syccess or 
failure on* each particular job. The hint 
‘hrown out by Hull (7) twenty-five years ago 
relative to the measurement of differential 
aptitude has received very little attention, and 
Edward Lee Thorndike’s (15) procedure for 
assignment of men has been almost completely 
ignored. It was only after the last war when 
some of us reflected on the manpower problems 
that were brought to the fore that it became 
apparent that personnel placement was a 
much broader matter than simply acceptance 
or rejection of applicants for jobs. 

More and more we are being confronted with 
the situation where we must place in one or 
another of several jobs all available persons, 
with few if any being rejected. Thus during 
the war the classification departments of the 
Army and Navy had to place every individual 
in some occupational specialty. Robert 
Thorndike has concerned himself with this 
problem of personnel classification and has 
made very important beginnings in attempting 
to formulate basic concepts and to state 
problems. 

The situation in which classification arises is 
one where we have V individuals to be assigned 
to N positions in k jobs. The objective is to 


' assign all individuals so as to achieve maximal 


over-all effectiveness of the organization. 
It is apparent that under such circumstances 
only limited use can be made of the individual’s 
absolute level of ability and the critical factor 
becomes the differences in level of ability 
for the k jobs. Thorndike points out that 
the situation is complicated by the fact that 
ordinarily certain jobs are more important 
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than others. Thus it may be more important 
in a factory to have the best possible assemblers 
rather than the best possible janitors. We 
are further confounded by quotas since we 
may need only a tenth as many janitors as 
assemblers. 

As Thorndike has formulated the problem, 
it becomes one of differential prediction. 
That is, rather than attempting to predict 
success in each job separately, we seek to 
predict differences in job success from differ- 
ences in aptitudes. From Thorndike’s math- 
ematical analysis some curious consequences 
emerge. The most critical factor in differential 
prediction is not the absolute validities for 
various jobs. Thus under some circumstances 
a test with lower validity coefficients will be 
more predictive than one with higher coeffi- 
cients. 

Thorndike points out that his analysis is just 
a beginning. But at least he has indicated 
some of the significant factors that must be 
dealt with. Certainly the types of work 
situations need to be considered. If the 
various jobs contribute independently to the 
over-all achievement of the organization, as 
salespersons and janitors in a department 
store, one type of solution will be called for. 
On the other hand, in certain circumstances 
jobs bear a relationship which may be termed 
successive, the item of work being dealt with 
by one individual and then by another. An 
example would be the case of orange sorters 
and packers, where the production of the 
latter is contingent upon that of the former. 
In this instance the classification problem 
would seem to be different. Finally, we have 
jobs that are coordinate, where the work is 
produced through the group effort of several 
persons. The riveter and the bucker-upper 
furnish a case in point. Here one would have 
to consider not only relative success on 
different jobs, but also placement within 
working groups. 

The implications of classification for meas- 
urement and prediction are many. I have 
no doubt but that this analysis of Thorndike’s 
prefaces a long history of developments which 
will lead to views strikingly different from 
those held today. 

The picture I have given you of current 
thinking in the field of industrial psychology 
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is admittedly biased. This is not only the 
result of my particular selection of conceptual 
formulations for review, but also of the projec- 
tion upon them of my own notions. Further- 
more, my review has been completely 
uncritical. The explanation of this lies in 
the fact that for the industrial psychologist 
these formulations are extremely stimulating. 
He wishes that he could participate actively 
in all of them. But the field is so broad and 
presents so very many exciting research 
problems that he must content himself with 
one bit. 


Received September 29, 1950. 
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An Analysis of a Point Rating Job Evaluation Plan * 


Donald L. Grant 
The Ohio State University 


In recent years there have been a number 
of analyses made of job evaluation plans (2, 
3, 4, 5, 6, 7, 8). Employing factor analysis and 
multiple correlation methods each of these 
studies has demonstrated: (1) a lack of in- 
dependence among the rating variables (job 
factors) ; (2) low relationships between some of 
the rating variables and the measure of job 
worth (grades, classes, or total points); and (3) 
that a very few of the rating variables can, for 
all practicable purposes, accurately predict the 
measure of job worth. The present article 
reports another such study (1). 

In the study reported in the present article 
an analysis was made of a point rating job 
evaluation plan covering the clerical workers 
of a large manufacturing organization. Under 
the plan studied jobs are rated on eighteen 
job factors. Points are assigned jobs on each 
of the job factors according to a job-to-job 
comparison method.’ The factors are weighted 
on an a priori basis according to the total 
number of points that any job may be assigned 
on each of the factors. The total number of 
factor points for each are obtained and a labor 
grade assigned the job according to its point 
total. The plan provides for 21 labor grades. 
The labor grade for each job was employed in 
the study reported in this article as the 
measure of job worth. 


; "Method 


Of over three hundred clerical jobs in the 
_ company whose plan was studied, a random 
‘sample of one hundred was selected for 
analysis. Points assigned each of the job 
‘factors (such as “experience” and “com- 
plexity”) for each of the jobs in the sample 


* The study reported in the present article is a sum- 
mary of a thesis submitted in partial fulfillment of the 
requirements for the Master of Arts degree at The Ohio 
State University. The author wishes to acknowledge 
the assistance and guidance given him by Dr. Harold E. 
Burtt, Dr. Herbert A. Toops, and Dr. Robert J. Wherry 
in the preparation of the thesis. 


were punched on IBM cards and intercor- 
related by a method designed by Toops (11). 
In addition, the rating variables were cor- 
related with the measure of job worth. 

A method of factor analysis was employed 
to interpret the interrelationships among the 
job factors. The method employed is a 
modification of the Thurstone centroid method 
(10). The modified method is a multiple 
group method developed by Wherry, Brogden, 
Gaylord, and Taylor of the Personnel Research 
Section, Office of the Adjutant General, 
Department of the Army. It is as yet 
unpublished. Communality estimates were 
corrected, minimizing residual error, by an 
iterative method introduced by Wherry (12). 
The measure of job worth was added to the 
factor matrix by this iterative method. 

The multiple correlation of the job factors 
with the measure of job worth and the regres- 
sion weights (standard and gross score) for 
the job factors were computed. The shrunken 
multiple correlation of the job factors with the 
measure of job worth, the job factors entering 
the solution, and their order of entrance were 
determined by means of the Wherry-Doolittle 
test selection method (9). 


Results 


The intercorrelations of the job factors and 
the correlations of the job factors with the 
measure of job worth appear in Table 1. 
The final rotated and corrected loadings for 
each of the rating variables and the measure 
of job worth on each of the obtained factors 
appear in Table 2. The obtained mean of the 
residuals (for the 18 job factors) was —.002 
+ .002. The residuals appear in Table 1 
along with the intercorrelations of the job 
factors. 

Fifteen of the 18 rating variables entered 
the solution for the multiple correlation 
between the job factors and the measure of 
job worth. A shrunken multiple correlation 
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of .992 was obtained. The variables and 
their order of entering the solution are given 
in Table 3. The first six of the job factors 
entering the solution gave a multiple of .987. 
Table 3 also gives the range of allowable 
points, mean, and standard deviation for 
each job factor and the regression weights for 
the job factors entering the solution. In 
addition, the bottom row presents the total 
range of points for all factors, the mean labor 
grade, and the standard deviation of the 
labor grades. A standard error of estimate 
of .509 (approximately 4 labor grade) was 
obtained. 


Discussion 


An examination of the correlation matrix 
(Table 1) reveals that the job factors are not 
independent of each other. Several of the 
intercorrelations are high. It is also apparent 


that several of the job factors (e.g., “working 
conditions”) bear little relationship to the 
measure of job worth. 

Inspection of the factor matrix (Table 2) 
clarifies the relationships observed in the 
correlation matrix. Ten factors were obtained. 
Only six of the ten factors, however, have 


loadings on the measure of job worth. Of 
the six that have loadings on the measure of 
job worth, one (Factor I) accounts for nearly 
two-thirds of the variance in the labor grades. 
This factor, interpreted as a “skill demands” 
factor, is comprised of those skills a worker 
brings to the job. Another factor (Factor V) 
accounts for 16% of the job worth variance. 
This is a responsibility factor involving the 
responsibility for the non-disclosure of informa- 
tion and knowledge of a confidential nature. 
Jobs highly confidential in nature are in most 
cases secretarial and stenographic jobs with 
relatively high educational requirements. A 
third factor (Factor III) was also interpreted 
as a responsibility factor, namely responsibility 
for correct handling of financial matters and 
for customer goodwill. This factor accounts 
for around 10% of the variance of the measure 
of job worth. A fourth factor (Factor IV) 
was interpreted as a supervisory demands or 
leadership factor. Because only jobs paid on 
an hourly basis are included in the plan 
studied, this factor does not account for much 
(approximately 5%) of the variance of the 
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measure of job worth. The remaining factors 
that account for any of the variance of the 
measure of job worth were named “Respon- 
sibility for Effect on Subsequent Work” and 
“Length of On-the-Job Learning Period” 
(Factors II and VIII, respectively). To- 
gether they account for only 3% of the job 
worth variance. 

The job evaluation system under discussion 
can thus be described by ten rather than 18 
factors. Of the ten only six bear any relation- 
ship to the measure of job worth, four account- 
ing for 97% of its variance, and only one for 
66% of its variance. The analysis thus 
bears out previous studies in this regard, 
indicating that present job evaluation systems 
base their methods of judging job difficulty on 
a very limited number of actual factors. 

Though the results of computing the multiple 
correlation of the job factors with the measure 
of job worth bear out essentially the results 
of previous studies, it is a question as to 
just what this part of the analysis contributes 
to our understanding of current job evaluation 
systems. It strikes the author that the 
results of the factor analysis are much more 
informative, and that the results of the 
correlation analysis merely tell us that the 
present rating variables are so interdependent 
that a few of them will predict the total 
rating nearly as well as doseveral. The results 
of the factor analysis also give us this informa- 
tion and, in addition, inform us as to how many 
truly independent factory underlie the deter- 
mination of job worth. 

It should be pointed out that the analysis 
just discussed as well as similar analyses omit 
such pertinent problems as the validity and 
reliability of job evaluation systems, a priori 
vs. empirical methods for weighting the 
rating variables, the effects of bias in rating 
jobs, the validity and reliability of job descrip- 
tions and specifications upon which the 
judgments of job difficulty are made, the 
number of difficulty levels that can be discrim- 
inated, and the attitudes of employees toward 
the justice of the relative wages or salaries 
that are established by the current systems of 
job evaluation. Some of these problems have 
been dealt with in a few published studies, 
but the research in this field so far is limited 
in scope and in quantity. 
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Table 2 
Final Factor Loadings 
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(1) Education 
(2) Previous Experience 

(3) Learning Period 

(4) Complexity 

(5) Analysis 

(6) Initiative 

(7) Accuracy 

(8) Tact 

(9) Dexterity 

(10) Executive Ability 

(11) Leadership 

(12) Monetary and Good Will 
(13) Confidential 

(14) Effect on Subsequent Work 
(15) Monotony 

(16) Strain 

(17) Working Conditions 

(18) Accident 

(C) Labor Grade 
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Table 3 


Data Relevant to Correlation Analysis 








Range of Standard Beta Gross 
Points Deviation Weight Weight 








(4) Complexity 0-25 5.425 .192 142 
(2) Previous Experience 0-36 : 8.748 291 .134 
(1) Education 0-48 . 7.920 .223 13 
(6) Initiative 0-20 ‘ 4.694 112 .086 
(7) Accuracy 0-25 : 5.295 171 129 
(8) Tact . 0-10 f 2.460 .100 .162 
(3) Learning Period 0-9 . 1.797 .086 .192 
(10) Executive Ability 0-10 . 1.584 031 077 
(5) Analysis 0-25 ‘ 5.907 .105 072 
(11) Leadership Oo 8 , 1.294 .O§ 155 
(14) Effect on Subsequent Work 0.949 , 161 
(9) Dexterity 1.010 : .139 
(16) Strain 0.222 ; 484 
(12) Monetary and Good Will 1.261 ; .087 
(13) Confidential 1.590 ’ .056 
(15) Monotony 0.292 — 
(17) Working Conditions 0.332 — — 
(18) Accident 0.347 

(C) Labor Grade 4.021 
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Summary 


An analysis of a point rating job evaluation 
plan covering clerical workers was presented. 
The methods of analysis used were the statis- 
tical ones of factor analysis and multiple 
correlation. The results of the analysis bear 
out the results.of previous similar analyses in 
demonstrating: 


(1) the lack of independence among the 
rating variables (job factors) ; 

(2) the lack of relationship between some 
of the rating variables and the measure of 
job worth; and 

(3) the fact that only a few of the rating 
variables can, for all practicable purposes, 
accurately predict the measure of job worth. 


Received September 15, 1950. 
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Use of Biographical Inventory in the Air Force Classification Program * 


Abraham S. Levine and Virginia Zachert 


USAF Training Command, Human Resources Research Center, Directorate of Personnel Research, 
Lackland Air Force Base, San Antonio, Texas 


Since World War I, biographical data of 
various kinds have been used in the selection 
of individuals for occupational specialties. 
During the period between the two World 
Wars positive relationships were reporied 
between selected items of biographical informa- 
tion and success of students, salesmen, indus- 
trial and military specialists (2, 3, 4, 7). In 
most of these early applications, items concern- 
ing socio-economic status, education, previous 
experience or success in related occupations 
were used on the basis of real or presumed 
validity for the particular criterion. 

More extended use of biographical data was 
made during World War II. Numerous 
questions on biographical information were 
assembled into inventories which were keyed to 
yield a total predictive score. Air Force 
psychologists constructed new biographical 
items and borrowed promising items from 
civilian aviation and industrial personal history 
blanks, and personality questionnaires. From 
this effort emerged a “Biographical Data 
Blank” which became an integral part of the 
Aircrew Classification Battery. The items in 
this inventory concerned largely the applicant’s 
previous experience which appeared relevant 
to, or indicated strong interest in, the compo- 
nent activities revealed by a job-analysis of 
the aircrew position concerned. Questions 
were asked about hobbies, intellectual, physical 
and social activities. Also 
items which were indicative of socio-economic 
status. These items were empirically keyed 
and then cross-validated for predicting pilot 
and navigator training success. 

The validities for the Biographical Data 
Blank, for both the pilot and navigator 
criteria, bounced consistently around .30 
during the war (6). The two scores also 

* The views expressed in this article are those of the 


authors and do not necessarily represent the official 
views of the United States Air Force. 


included were 


correlated low with the other tests in the 
battery and hence contributed significant 
unique variance to both the pilot and navigator 
composite test scores. 


Development 


After World War II, the Air Force was 
again confronted with the problem of classify- 
ing large groups of men for training. The 
technical specialties for which basic trainees 
could receive appropriate training numbered 
approximately 150. In order to cope with the 
classification problem for enlisted specialties, 
a comprehensive full-day battery of about 15 
objective tests was devised. After school 
grades and other criterion data were available, 
composite test scores called Aptitude Indices ! 
were established for eight job families of 
related specialties. The development of this 
battery is described by Dailey (1). 

Included in the early Airman Classification 
Battery was a 378 item biographical inventory 
administered in three booklets. About 50 
questions asked for the strength of preference 
for representative technical specialties. The 
remaining items tapped educational back- 
ground, socio-economic status and participa- 
tion in activities directly related to the 
technical specialties of the post-war Air Force. 
A strong attempt was made to elicit factual- 
type information and to avoid items requiring 
introspection or self-ratings. 

On the basis of item analysis data, including 
item validities for the various specialties, a 
separate empirically-derived key was developed 
for each of the specialty families. Also it was 
possible to shorten the criginal three-hour, 
378 item inventory to a one-hour inventory of 
125 items. 


1 An Aptitude Index is a composite test score con- 
verted to standard score form and ranges in one-half 
standard deviation units from 1 (lowest) to 9 (highest) 
with a mean of 5. 
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Keying by Patterns of Response 


Before reporting representative validities of 
the Biographical Inventory, it might be 
profitable to describe a method of keying 
items developed by Cowles and Dailey which 
has been adopted as the standard method of 
keying most Air Force Biographical Inven- 
tories (5). This procedure is called “Keying 
by Patterns of Responses.” Essentially, it 
is an attempt to key on the basis of multiple 
probability where groups of responses or 
groups of items are considered simultaneously. 
First, the items are grouped into apparently 
homogeneous pools such as hobbies, athletic 
participation, or success in school subjects. 
In each group, if the number of phi coefficients, 


significant at any given level, does not sub- 
stantially exceed the number to be expected 
by chance, then the entire group of items is 
disregarded in making the key. On the other 
hand, if a group of items has been determined 
to be valid as a whole, then the validity phi 
coefficients for all responses to each item are 
considered simultaneously. Thus, if a regular 
gradient is evident in the responses to one 
item, then the responses at each end of the 
continuum will be assigned plus or minus 
weights and the mid or neutral point will be 
weighted zero. 

An illustration of how keying by patterns of 
responses differs from conventional keying 
methods is given by the following examples 
of phi coefficients obtained in validation of 


Table 1 


Product-Moment Correlations Between Biographical Inventory (BE601B) Raw Scores and 


Technical School Final Grades 








| EN EN ee oe te ee INE 





Biographical Inventory Keys 





Technical Schools 


Clerical 


Equipment 


Operator 


3 
3 
a 
g 
a 


Electronics 





Aircraft Welder 
_ Airplane and Engine Jet Mech. 
Airplane and Engine Mechanic 


& | Craftsman 


z 


9 
: Airplane Electrical Mech. 
} Airplane Sheet Metal Worker 
’ Ammunition Supply Technician 
_ Control Tower Operator 
"Diesel Mechanic 
Draftsman 
‘Electrician 
"Engineman Operator 
‘Fabric and Dope Mechanic 
Firefighter and Crash Rescue 
Machinist 
Parachute Rigger 
Plumber 
Radar Mechanic 
Sheetmetal Worker 
Stenographer 
Clerk Typist 
Carpenter 
Weather Observer 
Photographer 
Automotive Equip. Technician 


S22s 2 P FR | Instructor 
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responses of two items: 


1. A+.12; B+.08; C+.01; D—.07; E—.11. 
2. A+.23; B—.14; C+.06; D—.04; E+.17. 


In Item 1 we have a gradation of phi 
coefficients for each successive response on 
the response continuum. Using the procedure 
of keying by patterns of responses this item 
would have responses A and B keyed positively, 
response C keyed zero and responses D and E 
keyed negatively even though no one of the 
five responses was significant at a very high 
level. If the conventional method were used 
none of these responses would be keyed. 

In Item 2, response A was the only one 
which had a phi coefficient significant at the 
5% level. Using the conventional method, 
this response would have received a positive 
weight. However, since a gradation of phi 
coefficients is not evident none of these 
responses would be keyed by the procedure 
which takes into account the patterning of 
responses. 

A comparative study by Lecznar (5) of 
the Airman Biographical Inventory demon- 
strated a strong tendency for keys based on 
keying each response independently, according 
to the 5% significance level of its validity, 
to exceed the validity of pattern of response 
keys when applied to the same sample on 
which the keys are based. However, when 
cross-validated the pattern of response keys 
shows higher validity and less shrinkage than 
the conventional key. The results of this 
study also indicate that, in the case of con- 
ventional keys based upon significance at the 
1% and 5% level, the 1% keys show less 
shrinkage and higher cross-validity than the 
5% keys. These data support the generaliza- 
tion that the key which capitalizes least upon 
chance variance in the keying sample will 
have the least validity for this sample, but 
will have maximum validity and minimum 
shrinkage upon cross-validation. 


Validities 
The eight keys for the Biographical Inven- 
tory were empirically derived as described. 
Table 1 shows the validities of these eight 
biographical inventory keys. Final grades 
from 24 technical schools were used as criteria. 
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All cases on which final school grades were 
available were used; the samples varied from 
49 to 1,731 cases. The reported correlations 
represent cross-validities since the eight keys 
were developed on prior samples to those 
indicated in Table 1. The Airmen were given 
the classification battery between February 7, 
1948 and November 15, 1948, while in basic 
training at Lackland Air Force Base. 


Conclusions 


It is evident from Table 1 that in the case 
of most specialties, the Biographical Inventory 
keys have appreciable validity for predicting 
various technical school criteria. Since the 
Biographical Inventory keys correlate neg- 
ligibly with most other tests in the battery 
(.00 to .30), they aiso make a substantial 
contribution to the multiple R’s for predict- 
ing these criteria. Table 1 also reveals that in 
many instances, several keys are valid for a 
single criterion. Since the Biographical In- 
ventory keys have rather low intercorrelations 
with each other (median intercorrelation .26), 
the weighting of several of these keys into 
each Aptitude Index augments the contribution 
of the Biographical Inventory to the predictive 
efficiency of the Airman Classification Battery. 

All of the above mentioned contributions of 
the Biographical Inventory represent sufficient 
conditivns for increasing the multiple selection 
efficiency of the battery. However, the task 
of classification is to assign individuals to the 
jobs in such a way that the average success of 
all the individuals in all the jobs to which they 
are assigned will be a maximum. Thus, 
classification requires that, in addition to 
predicting probabilities of success in various 
occupational specialties, the relative fitness of 
an individual for these occupations must also 
be determined. Since the median intercorrela- 
tion of .26 between the eight keys of the 
Biographical Inventory is much lower than 
the intercorrelations between the respect‘ ‘e 
Aptitude Indices (median intercorrelation .64,, 
the Biographical Inventory makes a substantial 
contribution to differential classification over 
and beyond the increase in predictive efficiency 
for the various criteria. Through its effect 
in reducing the intercorrelations between the 
various Aptitude Indices, the Biographical 
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Inventory increases the prediction of the 
relative fitness of an individual for a number 
of different occupational specialties and makes 
possible a more efficient utilization of the 
available pool of talent in the Air Force. 


Received October 23, 1950. 
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Factor Analysis of Clerical Aptitude Tests 


John T. Bair 
Naval School of Aviation Medicine, Pensacola, Florida 


This exploratory study of clerical aptitudes 
was undertaken for the principal purpose of 
discovering the structure of clerical aptitudes. 
The clerical field embraces a large group 
of occupations, and consequently the measure- 
ment of clerical aptitudes has received intense 
study. This is fortunate, since tests to help 
in the estimation of success in these occupations 
are among the most necessary aids in employ- 
ment work. 

Although there are many different kinds of 
clerical jobs, it seems plausible that most of 
these jobs are related in various ways. Other 
workers have attempted to describe these 
relationships in terms of basic or primary 
abilities (1, 2, 4, 6). However, it seems that 
much remains to be done in the way of describ- 
ing the dynamic character of clerical aptitudes 
and in evaluating the tests that measure them. 

This present study, then, is concerned with 


the application of factor analysis techniques 
to clerical aptitude test data. A carefully 
planned analysis of representative clerical 
aptitude tests should be worthwhile both from 
the standpoint of test evaluation and of test 
construction. 


Procedure 


A battery of seventeen clerical aptitude 
tests, and one general intelligence test were 
given to a homogeneous group of 194 high 
school commercial students. 

A description of the tests selected and the 
particular subsections used are given below. 
The Psychological Corporation has published a 
report which describes these tests, with the 
exception of Tests 9 and 18, in more detail 
(3). Some of the tests were not presented in 
their entirety. Obviously, this may have had 
some influence on the reliabilities of these 
particular tests. However, it seemed justifi- 
able to include only specific subsections of 
particular tests in order to permit the use of 
a greater variety of tests in the battery. 


1. Number Comparison: This is Test 1 from 
the Minnesota Clerical Test. It consists of 
pairs of numbers which are to be compared for 
similarities or differences. 

2. Name Comparison: This is Test 2 from 
the Minnesota Clerical Test. This test re- 
quires the examinee to compare names for 
similarities or differences. 

3. Number Checking: This is Test 1 from 
the Test of Clerical Competence. The exam- 
inee is to verify typewritten numbers against 
handwritten numbers. The numbers are not 
placed exactly opposite each other as in the 
Minnesota Clerical Test. 

4. Name and Address Checking: This is Test 
2 from the Test of Clerical Competence. The 
examinee is to check handwritten names and 
addresses against others that are typed. Again 
the pairs do not lie exactly opposite each other. 

5. Number Checking: This is Part 2, sections 
1, 2, and 3, of the Clerical Aptitude Test. 
The examinee has to check the smallest rttum- 
ber in section 1, the second largest in section 2, 
and second smallest in section 3. 

6. Number Checking: This is Part 2, section 
4 of the Clerical Aptitude Test. The subject 
checks pairs of numbers that are the same or 
different. This test seems to be remarkably 
similar to the Minnesota Number Checking 
Test. 

7. Speed of Writing: This is Test 1 from the 
ERC Stenographic Aptitude Test. The ex- 
aminee must copy legibly the Gettysburg 
Address from a printed copy. 

8. Word Discrimination: This is Test 2 from 
the ERC Stenographic Aptitude Test. The 
examinee is required to choose the correct 
homonym which best completes a sentence. 

9. Copying Numbers: This is Test 3 from 
the IER General Clerical Test. The examinee 
copies numbers from one side of the page to 
another. 

10. Copying: This is Test 4 from the NIIP 
Clerical Test (American Revision). A series 
of names, initials and numbers are to be copied 
exactly from one column to another. 

11. Checking Copy: This is Part I from the 
General Clerical Test. The examinee is re- 
quired to check the accuracy of copy at the 
bottom of a page with the original at the top. 

12. Spelling: This is Part VI from the Gen- 
eral Clerical Test. The examinee is to rewrite 
incorrectly spelled words. 

13. Alphabetical Filing: This is Part II from 
the General Clerical Test. The examinee is 
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to locate names alphabetically in a pictorial 
representation of a file drawer. 

14. Filing: This is Test 6 from the NIIP 
Clerical Test (American Revision). The ex- 
aminee files a group of numbers and names. 
The names are to be arranged alphabetically. 

15. Classification: “is is Test 2 from the 
NIIP Clerical Test (American Revision). The 
examinee is to arrange under the proper head- 
ing such items as postage stamps, coal, gro- 
ceries, etc. 

16. Business Classification: This is Test 3 
from the Test of Clerical Competence. The 
examinee sorts letters for referral to different 
industrial departments. 

17. Arithmetic: This is Part 3 from the 
Detroit Clerical Aptitude Examination. The 
examinee is to solve simple arithmetical com- 
putation problems. 

18. Otis S-A Tests of Mental Ability: This 
is the Higher Examination form of this test. 
The examinee is required to solve general 
mental ability problems designed for high 
school and college freshmen students. This 
test becomes variable 36 in the correlation 
matrix and in Table 2. 


Thirty-six test variables were obtained from 
this battery, since it was decided to include 
the “error” scores on the seventeen clerical 
tests as separate variables in addition to the 
“correct” answer scores. This was done in 


order to give some clues to the organization 
of error scores and to see if they tended to 


cluster on one factor. Age was also included 
as an original variable. Product moment 
correlations were then computed from these 
scores. It was found, however, that the 
correlations of age and errors in speed of 
writing with the other variables were not 
significant at the 5 per cent level of confidence; 
therefore, they were dropped from the correla- 
_ tion matrix. 

_ Table 2 presents the correlation matrix of the 
_ thirty-six variables.' It was subjected to 
Thurstone’s method of centroid or multiple 
_ factor analysis (5). Thurstone’s method was 
_ used because of the greater flexibility and 
more consistent and psychologically significant 
’ factors that could be obtained. Three signifi- 
cant factors were extracted from the original 


' To reduce printing costs Table 1 has been deposited 
with the American Documentation Institute. Order 
Document 3180 from American Documentation Insti- 
tute, 1719 N Street, N.W., Washington 6, D. C., re- 
mitting $1.00 for microfilm (images 1 inch high on 
standard 35 mm. motion picture film) or $1.00 for 
photocopies (6 X 8 inches) readable without optical aid. 
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correlation matrix. These three factors then 
were rotated into a satisfactory approximation 
of a simple structure. The factor loadings for 


the variables after rotation are given in Table 
2. 


Results 


Interpretation of the factors in this study 
seems to have led to a fairly satisfactory 
identification of all three. A loading of .40 
is regarded as significant. 

It is evident from Table 2 that factor I 
has the two highest loadings on clerical tests 
where accuracy in perceiving comparisons 
between names and numbers, and not motor 
speed, pays the premium in good scores. The 
highest loading is on Test 5 which requires 
the ability to perceive the smallest, the second 
largest, and the second smallest numbers in a 
series. Test 5 also has zero loadings on 
factors II and III which means it has signif- 
icance only for factor I. Test 11 has nearly as 
high a loading as Test 5. It requires the 
ability to perceive errors in copy at the bottom 
of a page when comparing it with a correspond- 
ing line of the original at the top. Both of 
the tests include more of a perceptual span 
in observing and comparing than any of the 
other tests. Test 10 is very similar to Test 
11 in that accuracy is needed in copying 
names, numbers, and letters in the proper 
spaces from one side of a page to another. 
Tests 1, 2, and 3 demand accuracy in the 
perception of similarities or differences of 
numbers and names when they appear opposite 
or nearly opposite each other. Tests 13, 16, 
and 36 contain items that require ability to 
make visual discriminations. Since most of 
the “error” scores have negative loadings on 
this factor, accuracy in all the above dis- 
criminations seems very essential. The 
“error” score negative loadings were on the 
same tests that had positive loadings for the 
“correct” answers. This means that the 
“error’’ scores mainly corroborated the specula- 
tion as to the factors, rather than revealed 
any new concepts concerning the structure of 
“error” scores. 

On the other hand, the zero side of 
the ledger contains the test which consists 
primarily of motor speed, Test 7. Another 
test with a fairly low loading is Test 9, copying 
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Table 2 


The Rotated Factor Matrix: F, 








Perceptual 
Analysis 


Variable Number and Description 





Factor 
I 


Comprehension 
of Verbal 


Speed Relationships 





Factor 
Ill 





. Number Comparison 
- Name Comparison 
. Number Checking 
. Name and Address Checking 
. Number Checking (rank order) 
. Number Checking (comparison) 
. Speed of Writing 
. Word Discrimination 
. Copying Numbers 
. Copying 
. Checking Copy 
. Spelling 
. Alphabetical Filing 
. Filing 
. Classification 
. Business Classification 
. Arithmetic 
. Errors in Number 1 
. Errors in Number 2 
. Errors in Number 3 
. Errors in Number 4 
. Errors in Number 5 
. Errors in Number 6 
. Errors in Number 7 
. Errors in Number 8 
. Errors in Number 9 
. Errors in Number 10 
. Errors in Number 11 
. Errors in Number 12 
. Errors in Number 13 
. Errors in Number 14 
. Errors in Number 15 
. Errors in Number 16 
34. Errors in Number 17 
. Age 
Mental Ability (Otis Higher Level) 


12 
43 
.24 
44 
00 
—.05 
00 
67 
43 
17 
23 
59 
43 
A3 
54 
48 


OS .08 
Not included in original factor matrix 
35 —.51 
02 —.26 
04 —.10 
.28 .03 
.10 —.49 
05 —.22 
.24 —.36 
.27 —.28 
44 —.25 
Al 01 
Not included in original factor matrix 
— .08 57 





numbers, which again requires a great deal of 
speed for a high score. In light of the fore- 
going, factor I can be described as Perceptual 
Analysis, with span and accuracy playing 
major parts, and speed of movement playing 
an almost vanishing role. Factor I accounts 
for 17 per cent of the total variance. 

As evidenced from Table 2, the six highest 
loadings for factor II are on tests which appear 


related to speed. Test 7 seems to be almost 
a pure test of speed. Tests 1, 2, 3, 4, and 10 
can certainly be viewed as tasks in which 
speed would be a distinct asset. 

Variable 33, errors in Business Classification, 
although it consists of the “error” scores on 
the test, is positively weighted on this factor. 
This would seem to indicate that those subjects 
who indiscriminately rushed through this 
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test made the better showing on this factor. 
Further corroboration seems to be furnished 
by the tests revealing zero loadings. These 
are 5, 8, 16, and 36. Jt seems within reason 
that these tests are ones in which speed per 
se is not required; and, in fact, would be 
actually detrimental for subjects who were 
freshmen and sophomores in high school. 
That is, the more quickly they would rush 
through the selection of the answers on these 
tests, the more prone they would be to overlook 
clues that would lead to the correct answers. 
This in turn would lead to either zero or 
negative loadings. The absence of negative 
loadings for the “error” variables seems to 
indicate that accuracy was not important 
on this factor. We must conclude, then, 
that factor II seems almost certainly to be 
one dealing with Speed, particularly in making 
simple discriminations. Factor II accounts 
for 14 per cent of the total variance. 

It appears from Table 2 that factor III is 
' also overdetermined. The two highest factor 
loadings are on tests that demand a high 
degree of verbal ability, 8 and 12. Tests 15, 
16, and 36 all contain items that require 
verbal comprehension. Tests 2, 4, and 13 


involve verbal material, although they do not 
require the complexity of interpretation de- 


manded in the above mentioned tests. The 
_ only two tests that are not concerned with 
verbal items are Tests 9 and 14. Both of 
these seem to include the factor of immediate 
' memory and the grasping of relationships. 
On the low side of the ledger, Tests 5 and 7 
_ show negligible loadings which indicates that 
' factor III does not require much in the way of 
_ speed and perceptual analysis. 
__ The only negative loadings, on variables 25 
_ and 31, seem to point up the verbal aspect of 
this factor, since these negative loadings 
: resulted from the “error” scores of the word 
_ comprehension and the spelling tests. How- 
ever, the significant loadings on two tests 
* containing number material seem to indicate 
something in addition to verbal ability. One 
logical interpretation is that the factor could 
be tentatively identified as Comprehension of 
Relationships, with verbal ability assuming a 
more important role. It would seem that the 
presence of this factor would be increasingly 
more significant for the performance of more 
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complex clerical job functions. The third 
factor accounts for 11 per cent of the total 
variance. 

A final comment on the total variance of 
the three factors seems indicated since they 
account for only 41 per cent of the total 
variance. The ability required by this partic- 
ular battery must be somewhat more in 
unique rather than in common factor space. 
Over half of the variance of the battery needs 
to be explained further on the basis of unique 
and unrelated components. 

Although the inclusion of the “error” scores 
did not reveal any new factors, they did aid 
in the identification of the factors found in 
the present study and therefore probably 
should be included routinely in the matrices. 


Summary and Conclusions 


Results of this investigation suggest several 
considerations. 


1. The intercorrelations among the various 
tests, all of which purport to measure clerical 
aptitudes, cannot be adequately accounted for 
by postulating a single general factor . of 
clerical ability. It would seem that there are 
several different components influencing per- 
formance on these seemingly similar kinds of 
tests. 

2. Since only 41 per cent of the total 
variance was accounted for by these three 
factors, there may be other factors that 
would account for scores made on the general 
clerical aptitude tests. If further tests of 
types similar to those that have low loadings 
on all three factors had been added, it is 
probable that additional identified factors 
would show up. 

3. The Minnesota Clerical Test, which 
involves checking pairs of numbers and names 
for similarities and differences, seems to be 
related positively to more general types of 
clerical aptitude tests than any other tests 
included in the battery. This test accounted 
for more variance than any other and was 
found to have high loadings on all three 
factors. This seems to corroborate the results 
of Andrew (1), who found the Minnesota test 
to be more significant in measuring clerical 
aptitudes than any other test used in her 
battery. 
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4. The present study was limited to an 
analysis of the factors present in commonly 
used clerical aptitude tests. Naturally, factors 
other than basic aptitudes, such as social 
effectiveness, interest in the work, co-opera- 
tiveness, etc., might also enter into actual 
clerical job success. 


Received October 9, 1950. 
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Mental Ability Tests in Clerical Selection 


Edward N. Hay 
Edward N. Hay & Associates, Inc., Philadel phia 


“Cross Validation of Clerical Aptitude 
Tests” (1) was a report of the results of a 
study of prediction of success in a group of 82 
key punch operators by means of a battery of 
three clerical tests. The tests used were 
Minnesota Numbers, Hay Number Series 
Completion, and Hay Name Finding. These 
three clerical tests require 14 minutes of testing 
time. This study corroborates an_ earlier 
study of 39 machine bookkeepers (2). 

Subsequently the question arose whether 
scores from mental ability tests were available 
for this group; and if so, whether they showed 


that mental ability was a factor in the perform- 
ance. Such scores were available and they 
show that while there was undoubtedly a 
mental alertness factor in the performance it 
did not serve to improve prediction over that 
furnished by the clerical aptitude tests alone. 
The mean scores of the two criterion groups on 
the Wonderlic Personnel Test were as follows: 


Mean Score 


28.0 
25.5 


Rating 
“Good” 
“ Poor” 


Sigma N 
721 53 
888 29 


The standard error of this difference is 


Table 1 
Twenty-one Combinations of Multiple Cutting Scores Applied to 53 Key Punch Operators 








Tests and Scores 


Pers. | Number 
Test Minn. Minn. Series 
; as Nos. Names ‘A’ 





Name 
Finding Multiple 
‘A’ ‘ R’ 


rated “Good” and 29 rated “Poor” 





Results 


Order %‘‘Good” 
of Ex- of Those 





Level of 
Signifi- 
cance 


Passed 


No. % 





105 


110 13 


104 
103 


cellence Who Pass 
22 27% 


1% 
48 59 86 
28 «35 86 


45 86 
56* 85 
51 81 


45 81 
62 80 
70 79 


70 79 
68 79 
65 79 


57 79 
57 79 


72 78 
67 78 
66 78 


71 78 
61 78 
51 78 





* Cumbersome; requires 5 tests. 
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1.14 and the difference is significant at the 3% 
level. These scores bear out the expectation 
that a mental alertness factor was at work. 

While the article referred to (1) dealt only 
with the cross validation of three clerical 
aptitude tests, it may be of interest to see the 
results of applying 30 different combinations 
of tests to the two groups of key punch clerks, 
53 rated “good” and 29 rated “poor.” 
Twenty-one of these are shown in Table 1. 
It can be seen that although there is a mental 
alertness factor at work, in this particular 
group there is a decrease in prediction efficiency 
when a mental ability test is used, either alone 
or in combination with other tests. 

Experience shows that it is possible to 
select for the simpler routine clerical tests 
without using a mental ability test since 
clerical aptitude tests alone are sufficient. 
However, a verbal-numerical mental ability 
test is a valuable addition to the test battery 
for identifying the quick learners, a quality 
essential for promotion to supervisory and 
technical positions. 

If any other combination of tests than the 
21 shown in Table 1 had given better results 
it would have been reported. All the better 
combinations appear in.this table. It should 
be kept in mind that there has been some 
selection in this group of 82 key punch opera- 
tors, since they were given these same tests 
when hired. Probabl;; the only reason there 
is a guod range of ability in this group is that 
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these were war-time hirings where the market 
did not provide as much choice as usual. 

The scores on Minnesota name checking 
were not shown in the original report (1) but 
are given here in Table 1. It can be seen 
that name checking was only slightly less 
efficient than other tests. 

It can be seen also that multiple regression is 
considerably less efficient than the “Multiple 
Cutting Score’”’ method, in that fewer “‘good”’ 
subjects can “pass.” In a tight labor market 
such restricted hiring would be impracticable. 
The lower efficiency of multiple regression is 
probably due to regression not being rectilinear, 
a condition precedent to the use of product 
moment correlation (3). 


Summary 


A re-examination of a previous report (1) 
on predicting success in key punch machine 
operation shows that mental ability tests are 
less efficient than so-called clerical aptitude 
tests for selecting clerks for routine tasks. 


Received March 9, 1951. 
Early publication. 
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Reliability of Ratings of Employee Satisfaction Based on 
Written Interview Records 


James F. Carey, Jr., Irwin A. Berg, and A. C. Van Dusen 
Northwestern University 


A number of studies (2, 4, 5, 6, 7, 8) have 
questioned the reliability and validity of 
various interviewing procedures. Yet the 
interview remains the most frequently used 
tool for obtaining and supplying information. 
Also, as Bingham and Moore (1) note, it is 
often used for motivating another person. 
Undoubtedly, some of the reasons for the 
wide use of the interview relate to its con- 
venience and adaptability. It is available to 
anyone who can talk and in forms which vary 
from the simplest question-answer conversation 
to the most labyrinthine circumlocution 
uttered from a psychoanalyst’s couch. Then, 
too, there is the conviction held by many 
persons that one is more likely to get at the 
“truth” by means of a face-to-face interview. 

It was partly due to this latter belief that 
the present study was made possible. A 
large midwestern manufacturing company had 
utilized a mail questionnaire with the aim of 
assessing salaried employee attitudes concern- 
ing their jobs. Believing that deeper feelings 
might be more readily expressed through 
interviews, management officials engaged a 
psychologist to arrange private interviews with 
a sample of the same employees. 


Procedure 


Accordingly, a team of 13 interviewers was 


_ used to interview a sample composed of 186 


out of slightly more than a total of 2,000 
_ salaried workers. All of the interviewers had 
: had a year-long graduate practicum in counsel- 
* ing, and all but two of them had held full-time 
‘positions as professional counselors. The 
‘sample was obtained by picking every tenth 
card from the alphabetized employee file. 
Because of absences, the proposed 10 per cent 
sample was reduced to 9 per cent. 
The actual interviewing was preceded by a 
series of pilot interviews held in another plant 
of the same firm. By means of this “dry run,” 
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the interviewers became acquainted with the 
company operations and with the specific 
interview situation. The interview procedure 
centered around a form prepared by the 
psychologist and company representatives. 
This form consisted of a series of questions 
directed at seven carefully defined topics of 
interest to the company: feeling toward 
adequacy of training, chances for advancement, 
satisfaction with management, job satisfaction, 
satisfaction with supervision, attitude toward 
family participation in company social and 
recreational activities, and attitude toward the 
interview. The interviewing procedure was 
essentially that of the open-end technique. 
A previously prepared question related to the 
topic to be investigated was asked, and the 
interviewee was encouraged to elaborate as 
fully as possible by the counselor’s restating, 
clarifying, and reflecting the expressed feeling. 
The counselor took notes during the course of 
the interview and wrote them up more fully 
after the interview was concluded. Each 
counselor then rated the employee’s responses 
on a 4-point degree of satisfaction scale. 
Considerable care was taken to secure under- 
standing and agreement by the interviewers 
conserning what was meant by the four 
categories of very satisfied, satisfied, dissatisfied, 
and very dissatisfied. Specific definitions and 
examples of degree of satisfaction for each of 
the seven topics were prepared after the pilot 
study. 

Each interview was held under private 
conditions. After the nature of the study and 
the sampling were explained, each employee 
was given careful assurance of complete 
anonymity. Most of the employees appar- 
ently accepted this assurance, for in only 13 
cases did the interviewers note that the 
employee seemed reticent or guarded in his 
replies. 

It is accepted rather generally that judg- 
ments or ratings made of interview data of 
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the sort recorded in the survey described above 
are more valid and more reliable when the 
rater has had experience and formal training 
in working with such material. Indeed, it 
might be observed that it is upon this premise 
that courses in interviewing are offered and 
selections for many personnel jobs are made. 
While some of the studies previously cited 
indicate that experienced interviewers can 
be quite unreliable in judgments based upon 
interview data, there is some evidence that 
reliability is highest when similar judgments 
are made by experienced persons with formal 
training. Fiedler (3) found, for example, 
that experienced psychotherapists of various 
theoretical orientations agreed notably better 
than novices or persons of lesser training 
when making judgments concerning the con- 
cept of a good therapeutic relationship. 

In the present investigation an analysis of 
data bearing upon these factors of experience 
and formal training is presented. The specific 
question may be phrased: Will persons with 
more experience and formal training rate 
wrilien interview responses on various topics 
more reliably than persons of lesser experience 
and training? 

Analysis 

Twenty-five interview records were 
randomly selected from the 186 records of the 
survey described earlier. While the original 
survey included seven topics, only three were 
used in the present study in order that all 
ratings could be completed in classroom 
periods of 50 minutes. These three topics 
concerned adequacy of training, chances for 
advancement, and satisfaction with management. 
The questions and the responses pertaining to 
these three topics on the 25 records were 
completely reproduced except for an occasional 
phrase which identified the company. Care- 
fully prepared rating directions and examples 
were also completely reproduced, together 
with a sample interview which had already 
been rated. 

To test the hypothesis that experience and 
formal training would be related to the 
ability to rate the above interview material 
reliably, 4 groups of subjects were used who 
differed in level of sophistication in evaluating 
such data. 
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Group 1: 55 students (47 males, 8 females) 
in an elementary psychology class. The 
class was in its second week at the time of the 
rating. These students were 18 to 19 years 
old and had little work experience. This 
group was considered to be the least sophis- 
ticated for the purposes of study. 

Group 2: 53 students (39 males, 14 females) 
in a course in industrial psychology. All but 
3 of these students were from 19 through 21 
years old. The members of this group had 
completed at least one previous course in 
psychology. The work experience of this 
group, while more extensive than that’ of 
Group 1, was largely limited to summer 
jobs. Since this group was older and had 
more extensive and pertinent training in 
psychology, it was presumed to be somewhat 
less naive than Group 1 about ratings of the 
sort required in the present study. 

Group 3: 21 evening college students (11 
males, 10 females) in an advanced psychology 
class “Interviewing and Counseling.” In age 
this group ranged from 23 to 29 years. All 
members of the group were employed in 
regular, full-time jobs and had been thus 
employed for a minimum of 2 years. Further, 
all class members had completed courses in 
elementary psychology, industrial psychology, 
and personnel methods since these courses 
were minimal prerequisites for the present 
class in interviewing. Because of more ex- 
tensive work experience and greater formal 
training, this group was considered to be of a 
higher level of sophistication than Groups 1 
and 2. . 

Group 4: 10 graduate students (6 males, 
4 females) enrolled in a practicum course in 
counseling. The age range was 25 to 42 
years inclusive. All members of this group 
were working toward graduate degrees in 
psychology or educational personnel work. 
All had had some full-time professional 
experience in interviewing or counseling, and 
all had had at least one semester of supervised 
practice in counseling as part of the course 
work. Because of formal training in psychol- 
ogy, training in counseling procedures and 
professional work experience, this group was 
considered to be the most sophisticated for 
the purposes of the present investigation. 
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Each of the 4 groups met under classroom 
conditions. After the instructions and sample 
interview were read aloud, the group members 
were asked to rate each of the 3 topics in the 
25 interviews on the 4-point scale of level of 
satisfaction. Every member of each group 
had his own complete set of instructions, 
definitions, and interview data so that review 
of any part of the material was possible at 
any time. 

By this procedure, 10,425 ratings (75 
ratings made by 139 persons) were obtained. 
Each rating was assigned a numerical value 
as follows: very satisfied, 1; satisfied, 2; 
dissatisfied, 3; very dissatisfied, 4. These 
data were then tabulated and measures of 
reliability obtained for each topic within each 
group. 

The measure of reliability used in the 
present study is one developed by Professor 
E. L. Clark of Northwestern University. As 
this method is as yet unpublished, a brief 
explanation is necessary. 

The reliability of judgments, when defined 
as the level of agreement among judges, 
can be readily determined from only two 
standard deviations. If the number of judg- 
ments made on each item rated is a constant 
number, we can use the ratio of the standard 
deviation of the mean judgments to the 
standard deviation of all individual judgments 
made as an indication of the reliability of the 
ratings. This ratio can be shown to be the 
correlation between the. average judgment 
made on each item rated and the individual 
_ judgments which compose the averages. In 


Table 1 
Reliability of Ratings 








Group Group Group Group 
1 2 3 4 


67 65 67 0 
ol o1 01 02 
67 0 69 7 
01 01 01 .02 


Management 70 72 71 .69 
S. O1 01 01 02 


All Topics 67 a oe 
; o1 01 Ol: Ot 
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Table 5 
Means and Standard Errors of Mean Ratings 








Group Group Group Group All 
1 2 3 4 Groups 


M 24 2.4 2.3 2.4 2.4 
S.E.  .88 88 85 82 86 





Training 


Advance- 
ment 


M 2.4 2.3 2.3 2.4 2.3 
S.E.  .86 719 81 76 82 


Manage- 
ment 


M 25 2.4 2.4 2.5 2.4 
S.E. 91 86 95 83 89 


M 2.4 2.4 2.3 2.4 
S.E.  .89 83 87 80 


All Topics 





order to get away from the element of correlat- 
ing judgments with themselves we can apply 
a correction for the ratio which, under the 
assumption of equal variability of judgments 
for each judge, will give us the reliability of 
individual judgments. The formula is: 
i; = mae — 4 i : where n is the number of 
judgments composing each average, and rq, j is 
the ratio’ of the standard deviation of mean 
judgménts to the standard deviation of all 
judgments, and 7;; is the reliability of all 
individual judgments. 


Results and Discussion 


While there is some dispersion of ratings, 
examination of Table 1 reveals a fairly high 
level of rating consistency. The four groups 
of raters, though differing in the character- 
istics of training and experience deemed 
important for satisfactory rating, show little 
difference in the respective reliability of their 
ratings. If these characteristics were impor- 
tant for consistent rating of material of the 
kind used in the present study, one would 
expect to find an increase in the reliability 
ratio as sophistication in dealing with such 
materials increased. 

Tables 2, 3, and 4! present the mean ratings 


1 Tables 2, 3, and 4 have been deposited with the 
American Documentation Institute. Order Document 
3268 from American Documentation Institute, 1719 N 
St., N.W., Washington 6, D. C., remitting $1.00 for 
microfilm (images 1 inch high on standard 35 mm. 
motion picture film) or $1.00 for photocopies (6 X 8 
in.) readable without optical aid. 
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made by each of the four groups for the 25 
interviews. The mean rating of any one 
group for any topic or for any interview 
closely approximates the ratings of the other 
groups. Clearly this is remarkable agreement 
and at a level which would be incredible if 
training and experience exerted significant 
influence as postulated in this study. In 
Table 5 the data for all interviews are sum- 
marized by topic and by groups of raters. 
Although the differences are slight, it will 
be noted that groups I and 4 are more in 
agreement with each other than groups 2 and 
3. When the data for all topics are combined, 
the difference between the mean ratings of 
groups 1 and 4 as compared to groups 2 and 3 
are significant at the 2 per cent level of con- 
fidence. Since group 1 is the least sophis- 
ticated in terms of this study and group 4 
most sophisticated, the chief point of difference 
between the groups is in direct opposition to 
the supposition that training and experience 
would make for more consistent rating of 
the material used in this investigation. 

From these data it seems clear that level 
of training and pertinent experience do not 
markedly influence ratings made of the inter- 
view material used in the present study. 
The consistency of the ratings is, in all prob- 
ability, due to the nature of the material 
and the carefully prepared rating instructions. 
In other words the task is virtually one of 
reading comprehension rather than one of 
judgment based upon special training and 
experience. Despite the fact that special 
training and experience is usually prescribed 
in morale interview studies, the present 
findings would suggest that such precautions 
are not likely to improve ratings of recorded 
interviews when a procedure similar to the 
one described above is followed. Of course all 
four of the groups in the present study had a 
year or more of college. Hence, a certain 
degree of literacy may be essential for reliable 
ratings of such materials. 


Summary 


A large midwestern manufacturing firm 
had written records of salaried employee 
interviews which had been used in a morale 
survey. Twenty-five of these interviews were 
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selected at random and the portions of the 
interviews dealing with ‘raining, promotion, 
and management were reproduced. As in 
the original study, each interview was rated on 
a 4-point satisfaction scale of very satisfied; 
satisfied; dissatisfied; very dissatisfied. The 
ratings were made on the basis of the inter- 
viewee’s reproduced responses to questions 
concerned with training, etc. 

Four groups of raters differing in level of 
training and pertinent experience were used 
in the study: group 1 was composed of 55 
students in an elementary psychology class; 
group 2 was made up of 53 students in a class 
in industrial psychology; group 3 was a group 
of 21 employed adults in an evening college 
class in “interviewing and counseling”; group 
4 was composed of 10 graduate students, all 
but two of whom had had full-time inter- 
viewing experience and all of whom had had 
more than a semester’s work in a practicum 
course in counseling. 

Although these groups differed markedly 
in level of training and experience there were 
no significant differences in their rating con- 
sistency for the three topics or any of the 
individual interviews. All four groups 
achieved a fairly high level of reliability, and 
all four groups were remarkably uniform in 
their mean ratings. 


Received March 26, 1951. 
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Stability of Job Preferences of Department Store Employees * 


Einar Hardin, Hans G. Reif, and Herbert G. Heneman, Jr. 
Industrial Relations Center, University of Minnesota 


Job preferences of employees and applicants 
have become more and more widely recognized 
as important factors in the success or failure 
of personnel and industrial relations programs 
in industry. According to Jurgensen, accurate 
information on job preferences is a valuable 
aid in designing and revising personnel 
policies and practices, including recruitment 
programs, in supervisory training, in diagnosis 
of employee morale, in collective bargaining, 
and in interviewing job applicants (7). 

The study reported in this paper was 
designed to measure job preferences of em- 
ployees in two department stores by means of 
the Jurgensen Job Preference Blank (7) and, 
in particular, to measure the test-retest 
stability of these preferences. Most of the 
above applications of the Blank require high 
test-retest stability of mean rankings of job 
preference factors by groups. Use of the 
Blank for selection, placement, and transfer 
of individuals would require high test-retest 
| stability of job preference rankings by in- 
dividuals. . 

Studies of job preferences have been reported 
by Berdie (1), Chant (2), Dunn (3), Hersey 
(6), Jurgensen (7, 8), Oxlade (11), Stagner 
(12), Wyatt, Langdon and Stock (13), the 
National Industrial Conference Board (10), 
the Fortune Survey (5), and others. Job 
security and advancement opportunities were 
generally ranked as most important in these 


* This study was done as part of the University of 
Minnesota Industrial Relations Center research studies 
known as the Triple Audit of Industrial Relations. 
Dr. Dale Yoder is director of the Center. Dr. Herbert 
G. Heneman, Jr. is assistant director and supervised 
“the present study, and aided in preparation of the 
‘manuscript. Einar Hardin and Hans G. Reif, former 
staff members, had a major part in preparing this paper. 
Acknowledgment should also be made of statistical 
assistance given by David A. Leonard, former staff 
member. The constructive criticisms of Mr. Clifford 
E. Jurgensen of the Minneapolis Gas Company are also 
gratefully acknowledged. The background of this study 
is presented in more detail in The triple audit of industrial 
relations, University of Minnesota Industrial Relations 
Center Bulletin 11, Minneapolis: University of Minne- 
sota Press, July*1951. 





studies; pay was typically ranked in. the upper 
middle positions; and benefits were generally 
ranked as least important. 


Collection of Data 


In the present study, data on job preferences 
were collected in two midwestern department 
stores in the spring and summer of 1949, 
using the Jurgensen Job Preference Blank. 
In one store, a random sample (N = 44) of 
all employees (sales and non-sales) was drawn 
from a list of all rank and file employees. 
The job preference blank was administered 
individually at the end of an interview on 
employee attitudes toward their employment. 
The blank was not discussed in the interview, 
and the employee filled out the blank without 
assistance. 

In the other store,’ two separate random 
samples of all non-supervisory employees 
(sales and non-sales) were drawn. In both 
samples the job preference blank was admin- 
istered together with an attitude questionnaire 
to employees in groups. The employees in 
the first sample (N = 89) were asked to sign 
their questionnaires, while the employees in 
the second group (N = 68) retained their 
anonymity. A random sample (N = 23) of 
all supervisory employees was also drawn, 
and the same attitude questionnaire and job 
preference blank were administered anon- 
ymously in groups. Finally, the attitude 
questionnaire and job preference blank were 
administered twice to all members of a sample 
(N = 39) selected for a related study of 
employee attitudes. This sample was used 
for all of the test-retest comparisons reported 
in the present study and is called the “test- 
retest” sample. Two weeks elapsed between 
administrations in this group. 

About 13 per cent of all job preference 
blanks were found to be incomplete and were 


1 The two stores were members of a chain of stores 
and had substantially similar policies and practices. 
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discarded. (The sample sizes above refer to 
the number of usable forms.) This relatively 
large percentage may be attributed to the 
fact that the monitors at the administration 
did not check the finished forms closely 
enough for omissions, that the job preference 
blank was appended to another longer ques- 
tionnaire and overlooked by respondents, that 
there was some hesitancy on the part of some 
respondents to attempt the rankings, and 
that in the interview situation, interviewers 


were instructed not to force a ranking from* 


the respondent after several attempts at 
explanation. 


Analysis and Findings 


Mean Rankings: Mean rankings of each 
job factor are shown in Table 1. 

Table 1 shows some clustering of job 
preference factors and some interesting differ- 
ences in mean rankings between successive 
rank positions. Also, differences in rankings 
between the male and female groups should be 
noted. In general, rankings are similar to 
those shown in other studies. 

Similarity between the various sample 
groups in the two stores was measured by the 
coefficient of product-moment correlation be- 
tween the mean rankings of job preference 
factors. The following comparisons were run 
separately for men and for women: the 
identified samples in each store, the anonymous 
and identified samples in Store 2, and the 


Table 1 


Mean Rankings of Ten Job Preference Factors by 
Employees in Two Department Stores 








Men 
Sample Size: 62 
Factors Ranked 


Security 
Advancement 

Type of Work 

Pay 

Hours 

Supervisor 

Working Conditions 
Co-workers 
Company 

Benefits 


Women 
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anonymous and the supervisory samples in 
Store 2. In addition, comparisons were made 
between men and women within each of the 
five samples used in the study. All of these 
correlations were found to be significantly 
different at the 5 per cent level or- better. 
Significance testing showed more similarity 
between male supervisors and male non-super- 
visors than between female supervisors and 
female non-supervisory employees. More 
similarity in rankings was found within Store 
2 (comparing the anonymous and identified 
samples) than between Store 1 and Store 2 
(using identified samples only). 

Stability of Mean Rankings: When a sample 
of 39 employees was retested at the end of a 
two-week period, test-retest stability of mean 
rankings was found to be .98, as measured by 
the product-moment correlation coefficient 
for both men and women. 

Rankings of Individual Job Preference Factors: 
Test-retest stability of rankings of individual 
job preference factors was also investigated. 
For each of the ten factors, original and retest 
rankings were correlated. The rank assigned 
a given factor on the original test was cor- 
related with the rank assigned the same 
factor on the retest by each of the 39 individ- 
uals in the group. It is recognized that the 
correlation coefficients thus obtained reflect 
not only differences in stability of rankings 
but also variability of ranks assigned to each 
factor. In other words, one would expect 
that a factor ranked within a relatively 
narrow range by the 39 individuals would have 
a higher correlation in test-retest rankings 
than would a factor that is considered very 
important by some persons and relatively 
unimportant by others. The coefficients, all 
significant at the 1 per cent level, are shown in’ 
Table 2. . The stability of rankings for the 
co-worker factor is significantly higher than 
the stability of rankings for any other factor. 
Differences in stability between the remaining 
nine factors were not significant. The average 
correlation coefficient for these nine factors 
was .62. It is reasonably certain, at the 1 
per cent level of confidence, that the true 
mean of the test-retest correlation coefficients 
for these nine factors did not exceed .71. 
Fisher’s r-z transformation was used in the 
above computations (4). 
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Rankings of the Ten Job Preference Factors 
by Individual Respondents: Employees in both 
department stores, as well as Jurgensen’s job 
applicants, rank type of work and security 
high, while benefits rank very low. Job 
applicants at the public-utility company differ 
from the department store employees in 
ranking company as more important, and hours 
and pay as less important. It would be 
interesting to know whether these differences 
result mainly from attempts of job applicants 
to tell the employment interviewer what they 
think he wants to hear, whether job preferences 
change after hiring, or whether job preferences 
of these groups were intrinsically different. 

Test-retest stability of the ten job preference 
factors by individual respondents was measured 
by use of Spearman’s coefficient of rank-order 
correlation (9). Thirty-nine rank-order cor- 
celation coefficients were obtained from the 
test-retest sample. The highest coefficient 
was found to be .95; the median coefficient 
.72; and the lowest coefficient —.05. In the 
case of ten ranks, a coefficient of .63 is required 
at the 5 per cent level of significance and a 
coefficient of .76 at the 1 per cent level. 

Some indication of the consistency of depart- 
ment store employees in ranking the ten job 
preference factors is available from the scatter 
diagram shown in Table 3. This diagram 
indicates that employees in the sample 
displayed some inconsistency in their individ- 


Table 2 


Product-Moment Coefficients of Correlation Between 
Test and Retest Rankings of Ten Job 
Preference Factors 


Note: Sample size is 39 











Factor Correlation 





Advancement 71 
Benefits .60 
Company 54 
Co-workers 91 
Hours 

Pay 

Security 

Supervisor 

Type of Work 

Working Conditions 
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Table 3 


Consistency of Individual Job Preference Rankings 
Test-Retest Sample N = 39* 












































= Nr NS OO COBO 
Ne we SH & & “Oo 














ow mOnauwrk wn = 


i 2 1 


— 
_ 








Rank Assigned on First Administration 








10 9 8 a SS St & Fe 1 
.« Rank Assigned on Second Administration 


* The numbers plotted in each cell of the diagram 
represent the number of persons who ranked a factor 
as indicated. Because ten factors were ranked and 
re-ranked by each individual, the total of all the num- 
bers in the cells is 390. The items in this table have 
a linear correlation value of +.57. 


ual choices over a two-week period. It is 
interesting, however, to observe that most 
consistency was found for the first and last 
choices. These were more consistent than 
intermediate choices. The linear correlation 
coefficient for this table is +.57. 

Differences in rankings by individuals may 
be explained in at least four ways: (1) some 
individuals may lack the ability to rank as 
many as ten factors in order of importance; 
(2) some individuals may consider some factors 


‘to be equally important; (3) individuals may 


differ in motivation to fill out the Blank 
properly; and (4) some persons may greatly 
change their opinions about what is important 
to them. The test-retest correlation of .57 
raises questions about the application of the 
Blank in establishing individual profiles of 
preferences, to be used for selection, placement, 
and transfer of individuals. However, from 
these studies one would expect that the factors 
initially ranked as 1 or 10 will, on retest, be 
ranked as 1, 2, or 3 and 8, 9, or 10, respectively. 


Summary 


1. Job preferences of samples of employees in 
two midwest department stores were measured 
by means of the Jurgensen Job Preference 
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Blank. The Blank was administered to 
random samples of employees in both stores. 

2. Mean rankings of job preferences were 
found to be similar to rankings by applicants 
to a public-utility company, reported by 
Jurgensen. The main differences were that 
the department store employees ranked 
company as less important and hours as 
more important than did the public-utility 
applicants and that employees in one of the 
department stores ranked advancement as 
substantially less important than did either 
the applicants to the public-utility company 
or the employees of the second department 
store. 

3. Test-retest stability of mean rankings 
was +.98 for both men and women. The 
Blank is sufficiently reliable to permit measure- 
ment of group preferences. 

4. Test-retest ranking of individual prefer- 
ence factors, however, showed more inconsist- 
ency than did group preferences. Additional 
investigation into consistency of individual 
employee rankings and determination of the 


degree of stability needed in industrial person- 
nel work appears desirable. 
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Relative Severity of Air Line Passenger Complaints 


Seymour Banks 
De Paul University 


One of the most fundamental problems of 
the public relations department of a business 
is the evaluation of the intensity behind 
complaints against that business or any of its 
operations. The following is a discussion of 
work done in an attempt to answer the 
question raised by the superintendent of 
passenger relations of a major air line,—how 
to rank topics of irritation with the air line in 
order of their effect on passengers’ attitudes? 
Without knowing this, there was no simple 
way of determining the proper response to 
complaints. Obviously, the safest thing was 
to treat all complaints as if failure to reduce 
the passenger’s sense of discomfort would 
cause him to cease flying altogether. And as 
obviously, this was the least practicable 
procedure. The work completed was explor- 


atory and the final answer was not forth- 
coming; but it is felt that the procedure was 
demonstrated to be valid and capable of 


being used much more widely. 

The superintendent was given the task of 
answering all letters, compliments and com- 
plaints, sent to him.' This procedure of 
tallying of complaints (and compliments too, 
but the chief interest was in the complaints) 
left many questions unanswered. The volume 
of letters was insufficient to lead the company 
to a specific source of irritation, e.g., the way 
baggage was handled in Chicago. There was 
no way to compere the relative seriousness of 
complaints arising from the various cympany 
services. The relative number of complaints 
was not accepted as being truly indicative of 
relative importance since everyone agreed that 
some sources of complaint were more signifi- 
cant than others. But nobody could agree on 


1 The bulk of the mail was unsolicited. However, 
some letters came in as the result of a kit placed at each 
seat containing literature on the air lines and its opera- 
tions. This kit contained a letter form on which the 
passengers could write their views of the trip or anything 
else and send it to the company, postage free. Rela- 
tively few of these letter forms were sent to the com- 
pany and most of those sent in contained only broad, 
general compliments. 


the order of seriousness of the various sources 
of complaint. Finally, there was no way of 
determining the total amount of dissatisfaction 
on each of these sources of passenger irritation. 
The letters were like the exposed part of an 
iceberg and nobody knew how much of the ice- 
berg was under water. The ratio of articulated 
to total attitude might vary from service to 
service or between compliments and com- 
plaints. If this was so, then the relative 
volumes of letters received might be entirely 
misleading. 

The research procedure used to attack this 
set of problems was the Guttman scale analysis. 
This procedure was used because it ranked 
topics or statements along a favor-unfavor- 
ability scale while measuring simultaneously 
the intensity with which people were in 
favor of or opposed to these statements (2, 3, 
4). Respondents could be separated by 
attitude pattern and personal characteristics 
determined and related to expressed attitudes. 
This study of the characteristics of the 
individuals who hold given attitude structures 
is not possible in other attitude study pro- 
cedures which determine group attitudes by 
confounding individual statements (2). 

It was decided to start the work by examin- 
ing passenger attitudes. The passengers 
would be readily available for questioning 
while investigation of attitudes of the general 
public would call for a fairly involved sampling 
problem. The preliminary study concerned 
attitudes towards domestic air lines in general 
rather than any specific one. It was felt 
that in the exploratory stages, answers upon 
the general subject might be easier to obtain. 
Besides, these attitudes would serve as 
datum planes against which attitudes towards 
the specific air line might be compared when 
obtained subsequently. 

A six-question self-administered question- 
naire on passenger attitudes towards domestic 
commercial air lines was tested upon two 
transcontinental flights. This was an experi- 
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ment upon both the method of conducting 
the survey and the applicability of the 
Guttman procedure. One half of the question- 
naires obtained comments upon attitude and 
the other half upon both attitude and intensity. 
It was found that the passengers responded 
unanimously to the request of the stewardesses 
to fill out a questionnaire, that both types of 
questionnaires gave comparable answers on 
attitude, and that there was sufficient indica- 
tion of structured attitudes to warrant further 
use of the Guttman scale technique. 

The pretest led to the elimination of one 
question because it had no discriminatory 
power. Everybody checked its most favorable 
category. The list of statements used for 
the attitude study was increased to ten. Two 
different questionnaire forms were drawn up 
for the second test; both asked the same 
questions but in reverse order. This was 
done to inhibit the passengers from being 
impressed by the answers of their seatmate. 
In order to make the self-administered ques- 
tionnaire a little more interesting to fill out 
and to prevent stereotyped answers, three 
different intensity questions were utilized and 
the location of the most favorable category was 
reversed on successive questions for both 
attitude and intensity. 

A total of 373 questionnaires were collected 
on day time transcontinental flights originating 
from Chicago on two successive days. The 
questionnaires were handed out by the 
stewardesses after the passengers were judged 
to have settled down from the excitement of 
the start and before a meal was served. As 
in the pre-test, all passengers participated by 
filling out questionnaires. 

The procedure developed by Guttman was 
used to analyze the completed questionnaires 
(1, 3, 5). The scale structure of an attitude 
universe is revealed by the distribution of the 
tabulated responses to individual attitude 
questions after the questionnaires have been 
arrayed on the basis of total score. Perfect 
scalability is indicated by a perfect pattern of 
stair steps for each attitude statement; the 
questionnaires with higher total scores all 
show more favorable responses on each attitude 
statement than those questionnaires with 
lower total scores. Cursory observation of the 
tabulated results of the 373 questionnaires 
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revealed no such basic pattern of stair steps 
within the categories of the ten attitude 
statements. Instead, there were long runs 
that blended into lower categories by staggered 
patterns and in some places, there was a 
random scatter. A combination of categories, 
reducing them to two per attitude statement, 
produced a fairly clear stair-step pattern. The 
questionnaires were rescored and retabulated 
on this new basis. 

The crucial point of the analysis is the 
location of the values of total attitude score 
that best separate those questionnaires of 
respondents showing favorable feelings on an 
attitude field from those that have unfavorable 
feelings. This is largely an empirical proce- 
dure; the only criterion is that the cutting 
points be selected to minimize error, error 
being defined as the failure of total score to 
predict judgment on a category within a 
question. This is easily done when the shift 
from high to low category takes place for 
only one question at a given level of total 
score. When two or more questions show 
this shift at a given level of total score, com- 
promises must be made to minimize the total 
error in the group of columns. Despite the 
apparent arbitrariness of the procedure, two 
selections of cutting points, made six months 
apart, differed only insignificantly. 

Guttman originally proposed that the allow- 
able amount of error in his scaling procedure 
be 15 per cent if the attitude were to be 
accepted as scalable for the group studied; 
this was later lowered to 10 per cent (2, 5, 
p. 287). The error found: in the tabulation 
of the questionnaires studied averaged 17 per 
cent, ranging from 8.5 to 26.1 per cent among 
the ten questions. This overall error was 
far above the allowable limit. Two questions 
were dropped from the further analysis 
because there was no consistent relation 
between answers to these particular questions 
and the answers to the other eight questions. 
One of these high-error questions was on 
difficulty of obtaining space on flights and the 
other dealt with the courtesy and helpfulness 
of air line employees. The remaining subset 
of eight questions had an average error of 15 
per cent. These were analyzed to yield the 
“scale” discussed below. Because of the 
high degree of error, it is stretching things 
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Table 1 


Content of Structured Passenger Attitudes Towards Air Line Operations, by Attitude Group 
Key: + Favorable comment; — Unfavorable comment. 





Attitude Groups 





Questions on Air Line Operations 


Cc D E 





Air lines handle baggage satisfactorily 

The time gained flying is worth the time lost 
travelling to and from the airports 

Air line terminals are well operated 

Air lines are well-organized businesses 

Air lines’ flight schedules are reliable 

Flights are comfortable 

Air lines take adequate safety precautions 

Air lines are to be used for all trips over 200 
miles in length 
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quite a bit to call the results of the analysis a 
scale yielding scale types. But since 85 per 
cent is close to 90 and because the data met 
Guttman’s other requirements (5, p. 287), 
they were handled as if they led to a scale. 

To avoid confusion, we shall speak of the 
people who were identified as possessing a 
structured attitude as belonging to an “attitude 
group.” The content of the structured atti- 
tudes towards air lines possessed by these 
attitude groups is presented in Table 1. Such 
structures are abstractions at best, omitting 
the individual variations actually found, but 
the summarized structures do exist and 
represent the actual attitudes of most of the 
respondents. Thus the single attitude group 
symbol represents the entire multivariate 
distribution of responses to each of the items 
within the attitude field of all the subjects 
found to be in that attitude group. 

From Table 1, the attitude structures held 
by each of the attitude groups and the relative 
importance of these groups can be seen. 
Thus, 15.3 per cent of the passengers were 
in Group C, who thought: 


air lines are to be used for all trips over 200 
miles; 

air lines take adequate safety precautions; 

flights are comfortable most of the time; 

air lines are reliable when it comes to living 
up to their schedules; and 

air lines are well-organized businesses. 


But the passengers in Group C thought: 


air line terminals are not well operated; 

more than half of the value of the time 
gained by flying is lost traveling to and 
from the airports; and 

their baggage is not handled carefully by the 
air lines. 


The Guttman scale analysis is . tedious 
work; the question was raised if the original 
data would give the same listing of order of 
severity of complaints as the Guttman pro- 
cedure. The questions were ranked in severity 
on the basis of number of responses in the 
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Table 2 
Average Number of Flights, Annually, by Attitude Groups 
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Favorable) 


Attitude Groups 


( 
Favorable) 
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D E F G H 





Number reporting 83 10 31 54 
Av. number of flights 5.9 146 11.5 8.9 


35 13 72 17 14 
1443 198 198 19.3 12.3 





unfavorable or low category. The coefficient 
of rank correlation between the rankings of 
these topics by scale analysis and by arraying 
in terms of proportion of favorable comment is 
.952. The simple process of arraying topics 
on the basis of proportion of unfavorable 
comment, therefore, did as well as the much 
more complicated scale analysis on the task 
of evaluating the order of annoyance or 
irritation with the various topics within the 
general attitude field. However, only the 
scale analysis procedure can determine the 
presence of the structured attitudes and 
allow the isolation, for further study, of 
those holding the various types of attitude 
found. 

The next part of the scaling procedure was 
the analysis of the intensity responses. First, 
the questionnaires were sorted by attitude 
groups. Then the median intensity score was 
determined for each attitude group. The 
median is used as a summary of the intensity 
scores of the members of a scale type of 
attitude group rather than the mean because 
the former is less sensitive to a few extreme 
scores than the latter (4,5). The data of the 
intensity analysis, when plotted, gives the 
graph of the intensity function of the attitude 
studied. 

The line of the intensity function found in 
the air line passengers’ attitude is irregular in 
its movement but there is a general downward 
trend of intensity from left to right; the groups 
more favorable to the air lines hold their 
attitudes more firmly than those less favorably 
inclined. Because of the ups and downs in 
the line, it is possible that attitude groups G 
and H do not represent the zero-point of the 
intensity function. But even if they did, 
then 90.4 per cent of the passengers studied 
ywere actually favorable towards air lines, 9.6 


per cent were neutral and none were actually 
unfavorable. This is not unexpected, since 
few people who were unfavorable towards 
air lines would be aboard a flight. 

After the attitude groups have been 
identified and separated, it was possible to 
study their characteristics. Table 2 is a 
tabulation of the average number of flights 
made annually by the various groups located 
by the analysis. Some questionnaires did 
not give this information. In general, the 
more frequent fliers tend to be more critical 
than the less frequent fliers. Perhaps, after 
many flights, the glamor is lost; besides even 
if the incidence of unpleasant occurrence is low 
if one flies quite a bit, something unpleasant 
or disagreeable is likely to happen. 


Summary 


The attitude of a sample of air line pas- 
sengers towards domestic commercial air 
lines was analysed by the Guttman scale 
procedure even though the degree of error was 
slightly greater than allowed. It was found 
that the intensity function was J-shaped, 
indicating no real unfavorability towards air 
lines among the respondents. Generally 
speaking, the attitude groups with the lowest 
average amount of flying showed the most 
favorable attitude. A ranking of the various 
topics on the basis of number of unfavorable 
comments agreed quite well with the ranking 
of topics produced by the scale analysis. 

It is recommended that public relations 
personnel and others concerned with public 
attitudes towards their organization or institu- 
tion use scale analyses periodically to deter- 
mine the degree of absolute favorability in 
people’s minds on this topic. They can also 
utilize the scale types found to have outstand- 
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ingly good or outstandingly bad impressions 
of the organization for further research into 
the reasons why these individuals hold their 
attitudes. However for routine operation 
control, the usual questionnaire procedure will 
serve adequately to point out conditions 
giving rise to annoyance or dissatisfaction. 
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Predictors of Achievement in Graduate School 


Sam C. Webb 
Emory University 


When the Graduate School of Emory 
University began to expand after World War 
II, there developed a need for a more thorough 
evaluation of applicants over and above that 
afforded by the undergraduate transcript. 
While the Dean would have preferred to have 
test scores on some test such as the Graduate 
Record Examination to consider along with 
the transcript in evaluating an applicant 
seeking admission, for certain reasons this 
procedure was not practical. As a substitute 
procedure all graduate students were required 
to take a graduate qualifying examination 
consisting of the Cooperative General Culture 
Test and the Cooperative English Test, 
Higher Level, upon arrival on the campus. 
Satisfactory scores on both tests were required 
for admission to candidacy for the M.A. 
degree. A total score on the English Test 
equal to or above the 40th percentile on the 
National Senior Norms and a total score on 
the General Culture Test equal to or above the 
50th percentile on the National Sophomore 
Norms were defined as satisfactory scores. 
No limitation was imposed in regard to the 
number of times a person might take these 
tests. Ifa student failed to make the required 
scores on the tests, the committee admitting 
to candidacy would consider a strong depart- 
mental recommendation based on the student’s 
course work and might exempt the student 
from this requirement. Students presenting 
satisfactory scores on the Graduate Record 
Examination were excused from taking the 
qualifying examination. 

This article reports the results of a portion 
of a study which was conducted to evaluate 
the procedure as outlined above' and to 
evaluate all data currently available on 
graduate students to determine how well the 
traits and abilities desired of graduate students 
at Emory could be measured and predicted. 
The present article will be concerned only with 


' The writer analyzed the results but was not a par- 
ticipant in formulating the procedures outlined above. 


the value of the Cooperative General Culture 
Test, the Cooperative English Test and under- 
graduate averages as predictors of success in 
graduate school. 
Sample 

This study has been based upon the records 
of 492 graduate students who took the qualify- 
ing examination within the period winter 
1947 through winter 1949. This group con- 
sisted of students seeking graduate degrees, 
principally the M.A. degree, in 30 fields. 
The number of students in these fields ranged 
from 1 as in the fields of Bible, physiology and 
Spanish to a maximum of 146 students in the 
field of education. No department except 
education was represented by more than 42 
students. 


Grades as Criterion 


Of the data collected grades were the only 
measures of achievement that merited con- 
sideration as a criterion. Graduate students 
earn one of three possible grades: Fail F, 
Pass P, and Satisfactory S. These grades 
were assigned values of 0, 1, and 2, respectively. 
Grade point averages were,computed for all 
siudents. The number of grades upon which 
these averages were based varied from 1 to 
12 per student. Because grades on research 
and seminar courses were S for all students, 
they were not included in the averages. 
Despite the omission of these grades the 
frequency distributions were skewed as will 
be illustrated by the facts that of 336 non- 
education students 44.6 per cent had an 
average of S and that of 145 education students 
70.3 per cent had an average of S. 

A second index based on graduate grades 
obtained by dividing the number of S grades 
by the total number of grades made and 
called S/N was computed. Since this index 
correlated .95 with grade point average and 
since its use as opposed to averages facilitated 
computation, it was selected as the criterion, 
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although it was not considered to be a com- 
pletely adequate one. 

Before computing correlations a study of the 
homogeneity of the education versus the non- 
education groups was made. This study 
showed that the non-education and education 
students differed at the 1 per cent level of 
confidence or higher in regard to all subtest 
and total test scores and graduate grades. 
For example, on the Cooperative English 
Test the average total scores for the education 
group and non-education group were 60.29 
and 66.08, respectively. The difference 
between these yielded a Student’s / of 5.720. 
On Form W of the General Culture Test the 
average scores for the education and non- 
education group were 146.42 and 183.56, 
respectively. The difference between these 
yielded a Student’s ¢ of 5.32. Because of these 
differences correlations were computed between 
S/N and all scores on the English Test, both 
forms of the General Culture Test, and total 
undergraduate and undergraduate major 


averages for each group. Ninety-one educa- 
tion and 158 non-education students took 
Form W of the General Culture Test; 53 


education and 170 non-education students 
took Form X. The procedure of lumping all 
non-education students was followed as a 
matter of expediency, since there were only a 
few departments with enough students to 
make it possible to calculate coefficients 
separately, and since some evaluation for the 
non-education departments as a whole was 
desired. 

The undergraduate averages were calculated 
by assigning values of 4, 3, 2, 1 and 0 to the 
grades A, B, C, D and F. Where numerical 
grades were used and letter equivalents were 
not specified, the following ranges of grades 
were given values as follows: 4 = 95-100; 
3 = 88-94; 2 = 78-87; 1 = 70-77; 0 = below 


' 70. Since the 492 students came from 117 


different institutions, these averages are based 
on 117 different grading standards. 

While these correlations are not presented 
here in full, they may be summarized as 
follows: For the non-education group all the 
coefficients for the English Test were signif- 
icant at the 1 per cent level of confidence; 
five of’ the seven scores of Form W of the 
General Culture Test were significant at the 
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1 per cent level; four scores of Form X of this 
test were significant at the 5 per cent or 1 per 
cent levels; and both undergraduate averages 
were significant at the one per cent level. For 
the education group six scores of the English 
Test had coefficients significant at the 5 per 
cent or 1 per cent levels; three scores of the 
Form W of the General Culture Test had 
coefficients significant at the 5 per cent level; 
and no scores of Form X of this test and nei- 
ther undergraduate average had a significant 
correlation. Of the 27 correlations which were 
significant at the 5 per cent or 1 per cent level 
only 5 were above .30; the highest single value 
was .35. While these correlations suggest a 
significant relationship between the predictors 
and the criterion, they are too low to be of 
much practical value. 


Ratings 

Nevertheless these correlations suggested 
that if a more satisfactory criterion could be 
developed, higher validity coefficients might be 
obtained. They also raised the question of 
the validity of these tests for particular 
departments. The writer accordingly inves- 
tigated faculty ratings of over-all graduate 
achievement as a more suitable criterion. 
Ratings by the rank order method and by an 
eight point graphic rating scale method were 
obtained. Ratings by both methods were 
secured on studeats in the biochemistry, 
biology, chemistry, education, English, history, 
psychology and religious education depart- 
ments. Securing of ratings was limited to 
these departments because they were the 
only ones with a sufficiently large number of 
students to make the computations of cor- 
relations for departments worth while and 
because the time required to secure ratings 
on the other groups would have greatly 
prolonged the time for completing the study. 
The general procedure employed was to meet 
with the members of the faculty who were to 
make the ratings in each department. The 
exact nature of the over-all trait desired was 
discussed; and a definition of this trait was 
agreed upon which was appropriate to each 
department and which was satisfactory to all 
raters in each department. Also the meaning 
of the distances along the scale, the importance 
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Table 1 


Number of Raters, Number of Students Rated by Two 
or more Raters and Horst Reliability of Ratings 
for Departments Studied 





Number 
Number of 
of Persons 
Raters Rated 


Horst 
Reliability 
885 
800 
941 
360 
944 
885 
905 


Department 





Biochemistry 2 
Biology 4 
Chemistry 6 
Education 2 
English 5 
History 5 
Psychology 6 
Religious Education 1 





of grading on the basis of the particular 
departmental standard, of spreading out the 
students, and of rating only persons with 
whose work the raters were personally familiar 
were discussed. 


Reliability of Scores 


Intervals along the rating scales were 
assigned arbitrary weights ranging from 0 for 
the lowest to 7 for the highest. The scales 
were scored; and each student’s score was 
the average of all his ratings. Since all 
students were not rated by the same number of 
raters, the Horst procedure,? formula 1a, 
rather than the customary correlational 
method was used to determine the reliability 
of the ratings. This procedure requires the 
rating of each person by at least two raters. 
The reliability coefficients are shown in Table 
1. Of the 7 coefficients reported, only 2 fall 
below .88. The coefficient for the education 
group (.360) is unacceptably low. 

The rank order ratings were secured from 
3 days to one week after the scale ratings were 
collected. The customary procedure was em- 
ployed in making the rank order ratings. 

The rankings for each rater were normalized 
by the use of Hull’s tables. The normalized 
ratings for all the raters of each person were 
averaged, ranked, and normalized again to 


? Horst, Paul. A generalized expression for the relia- 
bility of measures. Psychometrika, 1949, 14, 21-32. 

3Hull,C.L. Aptitude testing. Yonkers, New York: 
World Book Company, 1928. 
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give the final scores. The reliability of the 
ratings were further checked by correlating 
the average scale ratings with the rank order 
ratings. These are shown in Table 2. It 
should be noticed that two raters for education 
are indicated. This was done because rater 
A had rated only 53 and rater B, 77 of the 
146 students. Only 36 were rated by both 
raters. Since the Horst reliability of the 
ratings for these 36 was so low, it was decided 
to treat the two sets of ratings separately. 
The larger N’s used in computing these 
reliabilities as compared with the N’s used in 
computing the Horst reliabilities result from 
the fact that all rated persons are used here, 
whereas only persons with two or more ratings 
are used in the Horst computations. These 
correlations are all significant at the 1 per 
cent level. Of the 9 coefficients reported 
none is below .83 and four are .92 or higher. 
Graduate grades were also correlated with 
the rating scale ratings. These correlations 
ate also shown in Table 2. Seven of the 9 
reported coefficients are significant at the 5 
per cent level; and 2 are not significant. It 
will be noticed, however, that even those 
values which are significant at the 1 per cent 
level are much lower than the coefficients 
between the rating scale and rank order ratings. 
These coefficients are not high enough to 
suggest that the two measures*—ratings and 


. $/N—place the students in the same relative 


order with high consistency. 


Table 2 


Correlations of Average Rating Scale Ratings with 
Average Rank Order Ratings and with S/N 





Rank Order S/N 


Department N r r 


bad 
so" 
.89** 
88** 
83** 
94** 
94** 
g4** 
83** 








Biochemistry 18 
Biology 30 
Chemistry 40 
Education (Rater A) 47 
Education (Rater B) 70 
English 26 
History 30 
Psychology 29 
Religious Education 16 





* Significant at 5% level. 
** Significant at 1% level. 
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Ratings as a Criterion 


Considering all the data, the writer believed 
that average ratings of over-all achievement 
was the better criterion. Consequently, 
Pearson Product-Moment correlations between 
ratings and all test scores and both under- 
graduate averages were computed for all 
departments. These are shown in Table 3. 

Complicating factors were encountered in 
connection with the General Culture scores. 
The number of persons in each department 
was too small to warrant computation of 
correlations for each form separately. Since 
scores on the two forms of the test are not com- 
parable, the raw scores on both were converted 
into. percentile ranks on the basis of the Na- 
tional Sophomore Norms; and these ranks were 
correlated with the criterion. 

For the English test 23 of the 63 correlations 
reported are significant at the 1 per cent 
level of confidence; 8 are significant at the 
5 per cent level and 4 others barely miss 
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being significant at the 5 per cent level. No 
correlations for any scores are significant for 
either the biology or chemistry groups. All 
coefficients are significant at the 1 per cent 
level for history; and all except three are 
significant at the 1 per cent level for both 
raters in education. Only one subtest has a 
significant correlation for religious education. 
While no subtest has a significantly high 
correlation with the criterion for all groups, 
it is interesting to note that the validity of 
the total reading score is significant at beyond 
the 5 per cent level in 6 instances and barely 
misses being significant at this level in one 
other case. Mechanics of expression and 
effectiveness of expression each are significant 
at the 5 per cent level or higher in 5 instances. 
For the English total score the correlations 
are significant at the 1 per cent level for 5 
departments—biology, chemistry, education, 
English, and history—and at the 5 per cent 
level for one additional department—psychol- 


Table 3 
Correlations of Average Ratings with Test Scores and Undergraduate Academic Averages hy Departments 








Bio- 
chem. 


Biol- 
ogy 


Chem- 
istry 


Educa- 
tion 
RaterA 


Educa- 
tion 
Rater B 


His- 
tory 


Eng- 


Psy- 
lish 


chology 


Rel. 





English Test (Raw Scoré) 
Mechanics of Expression 
Effectiveness of Exp. 

, Vocabulary 
Speed of Comprehension 
Level of Comprehension 
Total Read. Comprehension 
Total English 


~ N=18 
.50* 
RY ing 
.66** 
43 
44 
.54* 
6. * 


N=30 
.23 
31 
08 
09 
.25 
13 
.27 


N=40 


General Culture Test (Per- 
centile Ranks) 

Current Social Problems 
History and Soc. Studies 
Literature 
Science 
Fine Arts 
Mathematics 
Total General Culture 


N=16 
48 
A8 
43 
46 


N=39 
34* 
.40* 
27 
30 
7° 
36* 


07 
04 
18 


.29 
05 
ll 


N=31 
31 
.37* 


52° 


Academic Averages 
Total Undergrad. Av. 
Undergrad. Major Av. 


N=39 


-40* 


.44** 


4 -* 


N=53 
.39** 
ae 
16 
Bi” oy 
A6** 
44** 
48** 


N=77 
Ai** 

* so** 
36** 
27* 
20 
.30** 
.44** 


N=26 
7* 


N=32 
.65** 
.67** 
.50** 
.70** 
.64** 
moa 
as 


N=29 


N=52 
40** 
16 
38** 
.06 
aa 
19 
so 


N=76 
.22* 
22° 
27° 
15 
39" 
.09 
34** 


N=26 
A4* 
49* 
A7* 
03 
.40* 
07 
a 


N=31 
a 
.39* 
43* 
.23 
AT** 
a 
Ag9** 


N=31 
a" 


N=52 
09 
19 


N=73 
.20 
.20 


N=26 
.39* 
37 





* Significant at 5% level. 
** Significant at 1% level. 
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ogy. It barely misses significance for religious 
education; and it is clearly not significant for 
biology and chemistry. 

Of the 63 correlations reported for the 
General Culture test, 19 are significant at the 
1 per cent level, 18 are significant at the 5 
per cent level and 2 others barely miss being 
significant at the 5 per cent level. No subtest 
is significantly correlated with the criterion 
for the biology group. Fine Arts is signif- 
icantly correlated with the criterion for all 
groups except biology. Science is not signifi- 
cantly correlated with the criterion for any 
group. The General Culture total score 
correlates significantly with the criterion at the 
1 per cent level of confidence for every group 
except biology. 

The total undergraduate average correlates 
with ratings at the 1 per cent level of confidence 
for three groups (chemistry, history, and 
psychology) and at the 5 per cent level for 
two groups (biochemistry and English); the 
correlations are not significant for three 
groups (biology, education, and religious 
education). The undergraduate major average 
correlates with ratings at the 1 per cent level 
for only one group (psychology) and at the 5 
per cent level for three groups (biology, 
chemistry, and history). It does not correlate 
significantly for four groups (biochemistry, 
education, English and religious education). 


Test Difficulty 


The question of whether these tests, which 
were constructed for use with undergraduate 
students, are of suitable difficulty for use at 
the graduate level was studied next. In 
attacking this problem ogives showing the 
distribution of scores for. the education, non- 
education and National Norm groups on which 
the critical cutting scores were based for the 
English test and both forms of the General 
Culture test were prepared. In addition 
means and standard deviations were computed 
for the departments for which validity coeffi- 
cients were computed. For the sake of brevity 
these are not shown.‘ 


On the English test the ogives for the 
education, non-education and National Senior 


‘The writer will furnish any interested person a 
mimeographed copy of these data. 
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Norm groups follow the same general form; 
all except the ogive for vocabulary were 
essentially normal. The ogive of the educa- 
tion group was slightly below the Senior Norm 
ogive, while the ogive for the non-education 
group was slightly above the norm group 
ogive. The range of median scores for the 
three groups are as follows: education group 
59-63, Senior Norm group 60-65, and non- 
education group 65-69. With the exception 
of the English department average scores for 
students in the various departments on the 
total and subtests fall in the 60’s. Some of 
the standard deviations are less than the 10 
reported for the test. Except for the vocab- 
ulary test very few students made scores very 
near the top of the scale score distributions. 
For this test there is an accumulation of 
scores at the extreme end of the distribution. 
While this test is obviously too easy for the 
groups, the difficulty level for the other 
subtests and the total score seems satisfactory. 

‘As for the General Culture test for both 
forms the ogives of the three groups—educa- 
tion, non-education and National Sophomore 
Norm group—were similar in shape; but the 
ogive for the National Norm group was below 
the ogive of the education group. The ogive 
for the non-education group was above that 


of the education group. For Form W the : 


differences in median subtest scores between 
the education and norm group range from 
0 to 7 score points and the difference between 
the non-education and norm group medians 
range from 6 to 14 score points. The median 
total test score for the education and non- 
education groups are, respectively, 20 and 54 
raw score points above the norm median. 
For Form X the differences in median subtest 
scores between the education and norm group 
range from 0 to 4 score points. The differ- 
ences between the norm and non-education 
group subtest medians range from 8 to 15. 
The median total test score for the education 
and non-education groups are 14 and 70 points 
respectively above the norm median. While 
the 99th percentile point for some of the ogives 
for the non-education group is very close to 
the maximum score on some of the subtests, 
there is in no sense a clustering of scores at 
the extreme upper end of the scales. 

Even when one examines the distribution of 
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the raw scores on the two forms for special 
groups, which would be expected to make 
very high scores on certain subtests, there 
appears to be a reasonable spread of scores; 
and there is no piling up of scores at the 
extreme end of the distribution. It would 
therefore appear that these tests are of 
adequate difficulty for use with graduate 
students. However Form W seems to be 
considerably more difficult than Form X and 
appears to display a greater variability among 
scores. 


Discussion 


There are some points regarding the data 
presented above for which additional comment 
is appropriate. In respect to the rating scale 
reliabilities it should be noted that the Horst 
reliability formula is a stricter test of reliability 
than is the Pearson Product-Moment corre- 
lation coefficient. To secure a high reliability 
coefficient this formula requires not only that 
each person be assigned the same relative 
position in the distribution of ratings by all 
raters but also that the person be assigned to 
the same scale position by all raters. In the 


case of the low reliability coefficient for educa- 


tion, the Pearson Product-Moment correlation 
for the ratings of the 2 raters for the 36 people 
they rated in common was .55. The Horst 
reliability was .36- The difference arises 
from the fact that the raters were using 
different standards for ratings as is shown by 
the fact that the average rating for Rater A 
for these people was 3.5 while the average for 
Rater B was 4.4. 

The use of the National Sophomore Norms 
as a means of making scores on the General 
Culture test comparable may seem inappro- 
priate. Inspection of the scattergrams showed 
that the percentile ranks based on these norms 
produce a piling up of scores on the high end 
of the scale. It is possible that this procedure 
_ restricted the variability of the scores and 
thus reduced correlations. Nevertheless the 
relation between the percentile ranks and 
ratings was clearly linear; and the appropriate- 
ness of the statistic was not affected. It is 
true that percentile ranks computed for each 
form on the basis of the groups studied could 
have been used in computing the correlations. 
This procedure would have reduced the 
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skewness of the percentile rank distribution; 
but it was not used for two reasons. Since 
it would probably be more valid to assume 
equality of the large sophomore groups than 
to assume equality of the relatively small 
graduate groups, it was believed that the 
National Sophomore Norms would give more 
stable norms. Also because of the rather 
close similarity in ogives for the three groups, 
there would necessarily be a high correlation 
between the percentile ranks computed on the 
National Sophomore group and those com- 
puted on the local sample. 

In evaluating the validity coefficients for 
the various departments it must be kept in 
mind that the number of subjects in the 
samples are relatively small and that the 
coefficients must be interpreted with caution. 
However it appears to the writer that within 
the limitations of the Emory Graduate School 
the tests do in general display satisfactory 
validity for predicting graduate achievement 
as measured by faculty over-all ratings for 
certain departments. 

Since the biology group seems to be as 
variable as some of the other groups for which 
significant coefficients were obtained, it is 
difficult to understand why there was only one 
predictor (undergraduate major average) 
which correlated significantly at the 5 per 
cent level with ra:ings. The chairman of the 
biology department has stated that this 
sample of students consisted of a group whose 
performance was not on a par with the general 
run of graduate students of the department. 
With this statement in mind it is suggested 
that the results presented above for this 
department be considered inconclusive and 
not negative. 

While the total General Culture score is a 
valid predictor of over-all ratings for all 
departments studied except biology, it is 
interesting to note the subtests from which 
the total score validity largely emanates. 
Of the nine validity coefficients presented for 
each subtest; 8 are significant at the 5 per cent 
level or higher for the fine arts subtest; 7 are 
significant for current social problems; 6 are 
significant for both history and social studies 
and literature; only 3 are significant for 
mathematics; and none is significant for 
science. Whereas the total score is a valid 
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predictor for all departments except biology, 
it would appear that its validity is based on 
those subtests which one would expect to be 
useful for prediction in the English, history, 
education, and religious education depart- 
ments. For the science departments, however, 
preconceptions are not confirmed. 

It is further interesting to note that for 6 
departments the validity coefficients for the 
total English score are higher than that for 
the undergraduate average and that for 5 
departments total General Culture score is of 
higher validity than is the total undergraduate 
average. While most of these differences are 
not significant, it is obvious that the test 
scores used along with academic transcript 
could result in improved selection procedures 
for most of these departments. 

A disturbing fact relative to the General 
Culture Test is that the various forms differ 
in respect to difficulty and score variability. 
These factors affect validity coefficients and 
make it difficult to evaluate the validity of 
the various forms of the test without an 
empirical investigation. Of the two forms 
studied Form W was more difficult than 


Form X and appears to have greater score 
variability. Accordingly it has more signif- 
icant validity coefficients than has Form X 
(see page 4). 


Summary 


This article reports the validity of scores on 
the Cooperative General Culture Test, scores 
on the Cooperative English Test, Higher Level, 
and undergraduate averages as predictors of 
achievement in the Emory University Grad- 
uate School. Correlations between a criterion 
based on graduate grades obtained by dividing 
the number of satisfactory grades by the 
total number of grades obtained and the 
predictors were computed for the education 
and non-education students. In addition 
faculty ratings of over-all achievement were 
correlated with the predictors for each of the 
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departments of biochemistry, biology, chem- 
istry, education, English, history, psychology 
and religious education. 

The findings of this study may be sum- 
marized as follows: 


1. For the non-education group all the 
English scores, the undergraduate averages and 
nearly all General Culture scores correlated 
significantly with grades. For the education 
group only six scores of the English test and 
three scores of Form W of the General Culture 
test correlated significantly with grades. 
However most of these significant correlations 
were too low to be of much practical predic- 
tive value. 

2. The reliability of faculty ratings of over- 
all achievement were acceptable for all 
departments except education. 

3. Correlations of rating scale ratings with 
rank order ratings were high. Correlations 
of rating scale ratings with grades were with 
two exceptions significant but were consider- 
ably lower than the correlations with rank 
order ratings. 

4. The English total score correlated signif- 
icantly with ratings for all departments except 
biology, chemistry and religious education. 

5. The General Culture total score correlated « 
significantly with ratings for all depaiiments 
except biology. 

6. The total undergraduate average cor- 
related significantly with ratings for all 
departments except biology, education and 
religious education. 

7. Non-education students made higher 
scores on all tests than did education students. 
But with the exception of the vocabulary 
tests, the tests were of sufficient difficulty to 
be used with graduate students. 

Many of the correlations presented should 
be interpreted with caution because of the 
small number of persons involved in the 
computations. 


Received October 18, 1950. 








Predicting Achievement in Medical School * 


Robert Glaser 
University of Kentucky 


Data were obtained at the Indiana Univer- 
sity School of Medicine in order to evaluate 
the predictive efficiency of a battery of tests 
as an aid in the selection of medical students. 
The criterion employed in this study was 
achievement in the first year of medical 
school, as indicated by the general grade 
average at the end of the first year. This 
general average is a weighted average of the 
grades obtained by a student in specific 
courses, namely, gross anatomy, histology, 
neuroanatomy and physiology. The grade 
in each subject is weighted according to the 
number of hours the course meets each week, 
12, 5, 4, and 12 hours, respectively, for the 
subjects listed above. 

Each of the following tests was administered 
to 150 medical students at the beginning of 
their first year of medical school: 


The Differential Aptitude Tests—Space Rela- 


tions (2). This test consists of items which 
require two-dimensional figures to be trans- 
lated into their corresponding three-dimen- 
sional objects. The rationale behind this test 
was that an imporfant requirement of the 
first-year medical student appears to be the 
ability to translate his two-dimensional text- 
book illustrations into three-dimensional life 
objects. 

The United States Armed Forces Institute 
(USA FI) Tests of General Educational Develop- 
ment, College Level, Test Three: Interpretation 
of Reading Materials in the Natural Sciences, 
hereafter referred to as the Reading Interpre- 
tation test (6). This is an advanced scientific 
reading comprehension test. . 

The Miller Analogies Test, Form G (5). This 
is a high-level verbal analogies test which has 
been reported to have high validity for the 
selection of students for graduate work. 


* Aided by research grants from the Research Fund 
Committee, University of Kentucky and from the 
Indiana University School of Medicine. The writer 
also wishes to acknowledge the assistance of Prof, 
Delton C. Beier and Prof. Douglas G. Ellson of Indiana 
University in organizing this investigation. The com- 
putational labors and suggestions of Mr. Victor a 
of Indiana University and Mr. Eric Weingarten an 
Miss Elizabeth Ann Bicknell of the University ot 
Kentucky contributed to the preparation of this report. 
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The Army General Classification Test (AGCT), 
First Civilian Edition (1). This is considered 
to be a test of ‘‘general learning ability” for a 
wide range of individuals. Because of the 
restricted range of the individuals tested in 
this investigation, the standard time limit of 
40 minutes for this test was reduced to 35 
minutes. 

The Minnesota Multiphasic Personality In- 
ventory (M MP1) (3). This test was included 
in the battery in order to get at other than 
“intellectual” factors which might be related 
to success in the first year of medical school. 


Results 


Relationships with the General Grade Average. 
The intercorrelations between the tests in the 
battery and the single-test validities are 
presented in Table 1. The MMPI is discussed 
separately and is not included in this table. 

The validity coefficients for the Reading 
Interpretation test and the Miller Analogies 
Test compare favorably with those obtained 
for the October, 1947-February, 1948 Profes- 
sional Aptitude Test (PAT) (7, 8) which was 
previously administered to the students in this 
class. The single-score validities for the 
PAT are presented in Table 2. 

The multiple correlation coefficients and 
the beta weights for the tests in the trial 
battery are given in Table 3. The multiple 
correlation of the three PAT scores, composite 
verbal ability, index of general ability and 
premedical science achievement with the 
general grade average was .42. 

The data presented above indicate that the 
AGCT and the Space Relations Test are of 
little value in increasing the predictive effi- 
ciency of the trial battery. The Miller 
Analogies Test contributes substantially little 
to the relatively high validity coefficient of the 
Reading Interpretation test found in the 
sample studied. The beta weights for the 
Miller Analogies Test for the two, three and 
four variable regression equations are signif- 
icant at approximately the .05 level of con- 
fidence. 
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Table 1 
Intercorrelations, Validity Coefficients, M’s and SD’s for the Tests in the Trial Battery 











Intercorrelations 





Test 3 


Validities 





1. Reading Interp. ; 35 
2. Miller Anal. t 50 
3. AGCT ‘ ‘ _ 
4. Space Rel. , ‘ .28 


48 
38 
12 
04 





The MMPI was analyzed in two ways. 
In the first method the “abnormal’’ profiles 
were selected from the 150 cases. The 
criterion employed to distinguish an “ab- 
normal” profile was the same as that given by 
Meehl (4). Profiles were called abnormal 
under the following four conditions: 


1. Any of the eight components showed 
T > 9. 

2. Any of the eight components 
T > 80, unless K < 40. 

3. Any of the eight components 
T > 70, unless K < 50 and L < 60. 

4. Any of the eight components 
T > 65, unless K < 65 and L < 60. 


Thirty-one individuals or 20.6 per cent of 
the group showed “abnormal” profiles accord- 
ing to this criterion. This “abnormal” group 
showed no indication of belonging to any 
particular portion of the distribution of general 
averages. Secondly, product-moment correla- 
tion coefficients were computed between each 
of the nine MMPI scores and the general 


showed 
showed 


showed 


Table 2 


Correlations Between PAT Scores and General Average 
and the M’s and SD’s of the PAT Scores 





Test Score M 





Verbal Ability 

Scientific 

Social 

Humanistic 

Composite 
Quantitative Ability 
Index of General Ability 
Modern Society 
Premedical Science 


average. None of these coefficients departed 
significantly from zero. With both of the 
methods employed, then, the MMPI indicated 
no differential validity for predicting achieve- 
ment in the first year of medical school. 

Relationships with Specific Course Grades. 
The test scores were correlated with the 
students’ grades in their specific courses. 
The product-moment correlation coefficients 
obtained are shown in Table 4. 

A noticeable feature of this table is the 
relatively high degree of relationship between 
the valid test scores and grades in physiology. 
This characteristic of the physiology grades 
was further substantiated by the correlations 
between the Professional Aptitude Test scores 
and specific course grades which are presented 
in Table 5. As shown in Table 6, which 
presents the means and standard deviations 
of the criterion grades, the variability of the 
physiology grades is greater than the var- 
iability of the grades in other courses so that 


Table 3 


Beta Weights and Multiple R’s for the 
Trial Test Battery 








Tests Beta Weights 


Reading Interpretation 39 
Miller Analogies 17 


Multiple R 





50 


Miller Analogies .22 


Reading Interpretation 40 
AGCT 


—.13 


Reading Interpretation Al 
Miller Analogies .24 
AGCT —.12 
Space Relations —.08 
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Table 4 


Correlations of the Tests in the Trial Battery 
with Specific Course Grades 





Neuro- 
anat. 
.28 
30 
.09 

— .03 


Hist. 


32 
.29 
04 
.08 


Tests Anat. Physiol. 


59 
Al 
17 
— .04 





1. ReadingInterp. .34 
2. Miller Anal. 31 
3. AGCT 05 
4. Space Rel. 14 





the wide range of these grades may contribute 
to the comparatively high correlations. It 
was suggested to the medical school that the 
reliability and relevance of the physiology 


Table 5 


Correlations of the PAT Scores with Specific 
Course Grades 








Neuro- 


PAT Scores anat. 


Verbal Ability 
Scientific 
Social 
Humanistic 
Composite 
Quantitative Ability 
Index of General Ability 
Modern Society 
Premedical Science 


Anat. Hist. Physiol. 





34 
31 
.25 
36 
.28 
39 
37 


- .32- 38 





grades might be greater than grades in the 
other courses. It seems unlikely that phy- 
siology to a greater extent than the other 
courses would require the aptitudes measured 


Table 6 


Intercorrelations of the Specific Course Grades and 
M’s and SD’s of All Criterion Grades 








Intercorrelations 


Course Grades 1 2 





1. Anatomy 

2. Histology 

3. Neuroanatomy 
4. Physiology 

5. General Av. 


69 
59 
68 





by the test battery under consideration here 
and the Professional Aptitude Test. 


Summary and Conclusions 


A trial test battery was studied for an 
entering class of 150 medical students at 
Indiana University in order to evaluate this 
battery as a medical school selection procedure. 
The criterion against which the tests were 
validated was the general grade average at 
the end of the first year of medical school. 

The following results were obtained: 


1. The tests of the trial battery which 
correlated highest with the criterion were the 
USAFI Test of General Educational Develop- 
ment: Interpretation of Reading Materials in 
the Natural Sciences with a correlation 
coefficient of .48 and the Miller Analogies 
Test with an r of .38. The multiple correlation 
of these two tests with the criterion was .50. 
No significant increase in validity was offered 
by the inclusion of AGCT or the Space 
Relations Differential Aptitude Test. 

2. The Minnesota Multiphasic Personality 
Inventory showed no relationship to achieve- 
ment in the first year of medical school. 

3. Physiology grades in this study showed a 
generally higher relationship with test scores 
than grades in other courses. 


Received September 28, 1950. 
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Personality Characteristics of Nursing School Students 
and Graduate Nurses 


Irene Healy and Walter R. Borg 
University of Texas 


The need for more information concerning 
the personality characteristics related to 
success in nursing is becoming increasingly 
more apparent as we come to recognize the 
importance of personality in the nurse-patient 
relationship. 

Sick people are dependent and look to the 
nurse to relieve their feeling of anxiety and 
fear, and to provide strong emotional support. 
Nurses who are not emotionally suited for 
the work or who have emotional conflicts 
of their own are not able to give this care and 
support so important a part of good nursing 
(3). 

It has been frequently observed that some 
individuals with better than average mental 
ability, good physical health, and a high 
degree of interest do not succeed in nursing. 
Many nursing schools today have limited 
facilities for predicting the success of students 
whom they recruit and admit. Objective 
information concerning each candidate, inter- 
preted on the basis of reiiable criteria would 
assist these schools in eliminating those who 
are ulsuited and would also assist in guidance 
of students considering nursing as a career. 

If we can assume that the factors found in 
the majority of nurses who have completed the 
basic course and are satisfactorily engaged in 
nursing, have a positive relation to their 
success in nursing, we have a starting point 
for further study. 


The Problem 


The objective of this investigation is to 
study and compare personality profiles of a 
group of nursing school students with a group 
of graduate nurses and a norm group. 
Emphasis will be placed upon the following 
problems: 


1. Do nursing school students or graduate 
nurses differ significantly from the norms in 
any of the personality factors measured? 


2. Do graduate nurses differ significantly 
from nursing school students in the personality 
factors measured? 

3. What is the frequency and pattern of 
scores indicating personality maladjustments 
in the test profiles of nursing school students 
and graduate nurses? 


Previous Research 


Previous research has indicated that scores 
on achievement tests in academic subjects, 
although useful in predicting success in 
nursing school subjects, cannot validly predict 
a student’s success in the nursing profession. 
Supervisor’s evaluations of student nurses 
usually include personality ratings, but this 
type of rating has been shown to have little 
reliability. Many research and guidance‘ 
workers have seen the need for objective 
knowledge of the personality characteristics 
of nurses and nursing students (1, 8, 9, 10, 12, 
13), but little in the way of positive results in 
this area has been reported. 

Habbe (8) compared ward supervisor’s 
ratings, class grades and scores on the Thur- 
stone Personality Schedule for a group of 82 
student nurses, but found very little agree- 
ment among the variables. Garrison (4) 
tested 33 student nurses with the Bernreuter 
Personality Inventory. He found that the 
five students with the best efficiency ratings 
had higher dominance than the group average; 
other findings relating to Bernreuter scores 
were inconclusive. Nahm (11) used the 
Minnesota Personality Scale in her study of 
428 senior nursing students. In comparing 
their scores with university freshman women, 
she found that the nursing students scored 
significantly higher in “Morale” and lower in 
“Social Adjustment” and “Economic Conserva- 
tism.” Bennett and Gordon tested 235 
entrants to nursing schools, using the 
Bernreuter Personality Inventory and the 
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Table 1 
Mean Scores and Standard Deviations on Personality Tests for Groups Studied 








Test Groups 





Nurses 


Test Factors M 


Nursing 
Students 
N = 182 





20.8 
34.0 
17.3 
21.3 
36.1 
10.3 
17.8 
15.1 
34.2 
27.8 
51.4 
39.3 
72.8 


(sociability) 
(thinking introv.) 
(depression) 
(cycloid) 
(rhathymia) 
(activity) 
(soc. ascend.) 
(masc.) 
(inferiority) 
(stability) 
(objectivity) 
Ag (agreeable) 
Co (cooperative) 


OAZ™ BZrPARANSAY 


17.6 





Minne sota Personality Scale, and compared 
the re Sults with personality ratings by student’s 
colleagues and supervisors. Their conclusions 
concerning the personality measures are sum- 
marized as follows: “To the extent that it is 
possible to generalize from the findings 
presented in the present study, it would appear 
that the type of personality test used is of 
little or no value as a part of a battery of 
tests used in personnel selection, since it will 
predict neither success nor the attitudes of 
colleagues or supervisors” (2, p. 278). 


The Method 


In the present study it was hoped that the 
use of more valid measures with a moderately 
large and representative sample would give 
positive results. 

A group of 182 nursing school freshmen, 
taken from six schools of nursing made up 
the student sample used in this study. As 
both hospital and collegiate schools were 
included, it is believed that the group studied 
is fairly representative. In traits where 
students from hospital and collegiate nursing 
schools differed significantly, these groups 
were analyzed separately. All students were 
tested early in their first semester of study. 


In addition to the students, a total of 78 
graduate nurses were studied. Nurses from 
government and civilian hospitals, and nursing 
schools were included, representing most of 
the clinical areas in nursing. 

The test battery used included the following: 
(1) The Guilford-Martin Personnel Inventory; 
(2) An Inventory of Factors STDCR; (3) An 
Inventory of Factors GAMIN. These tests 
measure a total of 13 personality factors. 
As all subjects in these two groups were 
women, and as there are no separate norms 
for women on the tests employed, a group of 
143 women students at the University of 
Texas were used asa norm group. In addition 
to the test battery, a questionnaire was filled 
out by all subjects. 

All findings described as significant are 
beyond the 1% level of confidence unless 
otherwise stated. 

Findings 

Raw scores on the three personality tests 
employed can be converted into C-scores by 
the use of tables given in the test manuals. 
High C-scores on these tests indicate favorable 
characteristics and low scores unfavorable 
characteristics, although extremely high scores 
are indicative of poor adjustment in some cases. 
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The test authors consider that C-scores of 2 
or 3 generally suggest that corrective measures 
are necessary and C-scores of 0 or 1 suggest 
extreme maladjustment, in some cases border- 
ing on the pathological (5). 

In analysis of the data, comparisons have 
been made of the nursing school freshmen, 
the graduate nurses, and norm group made up 
of 143 college women. This type of compari- 
son, however, is concerned mainly with 
averages and may overlook extreme scores 
which are often valuable in gaining an under- 
standing of the group being studied. Thus, 
C-scores of 0 and 1 (indicative of serious 
maladjustment) and C-scores of 2 and 3 
(indicating need for corrective measures) were 
tabulated for further study and comparison 
with the norms. 

The Inventory of Factors GAMIN: This test 
is designed to measure five independent 
variables. Guilford describes these variables 
as follows: 


““G—general pressure for overt activity. 
A—ascendancy in social situations as opposed 
to submissiveness; leadership qualities. 
M—masculinity of attitudes and interests as 
opposed to femininity. 
I—lack of inferiority feelings; self-confidence. 
N—lack of nervousness and irritability” (6). 


In comparing the mean scores of graduate 
nurses, nursing school students and college 
women and the variability of these scores, 
a number of small but significant differences 
were found. Although the mean scores of 
neither nursing group were significantly differ- 
ent from the norms in the G factor, students 
were significantly higher than graduate nurses 
in this area to the five percent level of con- 
fidence. This may possibly have been due to 
age difference. Graduate nurses also had more 
scores in the two extreme categories which are 
considered to be indicative of maladjustment. 

In factor A, social ascendancy, mean scores 
show nursing students to be significantly 
less ascendant than college women. The 
nursing student group also had a slightly 
higher percentage of scores in the categories 
indicating extreme submissiveness. Graduate 
nurses showed considerably less inferiority 
feelings than the other groups and had fewer 
cases of extreme inferiority scores. Student 


277 


nurses, particularly, those from hospital nursing 
schools, had a _ considerable number of 
low scores in this factor. A comparison 
of mean scores on nervousness showed 
the graduate nurses to be significantly 
less nervous than nursing students or the 
norm group. The graduate nurses group also 
had very few scores indicative of extreme 
nervousness, suggesting that this trait makes 
success in the field difficult. Table 2 shows 
the marked differences between scores of 
graduate nurses and the other groups in this 
factor. There were no significant differences 
among the three groups in masculinity-feminin- 
ity acores. 

The Inventory of Factors STDCR: The five 
personality variables measured by this test 
are described by Guilford as follows: 


“S—Social introversion—extraversion.—Shy- 
ness, seclusiveness, tendency to withdraw 
from social contacts, versus sociability, 
tendency to seek social contacts and to 
enjoy the company of others. 

T—Thinking _introversion—extraversion.— 

An inclination to meditative or reflective 

thinking, philosophizing, analysis of one’s 

self and others, versus an extravertive 
orientation of thinking. 

D—Depression.—Habituaally gloomy, pessi- 
mistic mood, with feelings of guilt and 
unworthiness, versus cheerfulness and 
optimism. 

C—Cycloid disposition—Strong emotional 
fluctuations, tendencies toward flightiness 
and emotional instability, versus uni- 
formity and stability of moods, evenness 
of disposition. 

R—Rhathymia.—A happy-go-lucky, carefree 
disposition, liveliness, impulsiveness, ver- 
sus an inhibited, over-controlled, con- 
scientious, serious-minded disposition” 


(5). 


A comparison of the three groups studied 
in scores on the S factor reveal both nursing 
students and graduate nurses to be signif- 
icantly more introverted than the norm group 
of college women. These groups also had 
considerably more extreme scores than the 
norm group as may be seen in Table 2. Grad- 
uate nurses were significantly more cheerful 
and optimistic than either nursing students or 
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Table 2 
Number and Per Cent of Each Group Scoring in Extreme Categories 





Ss T D Cc 


N 





Graduate Nurses, N = 78 
C-Scores 0 and 1: N 
% 
C-Scores 2 and 3: N 
% 


Nursing Students, N = 182 
C-Scores 0 and 1: N 
% 
C-Scores 2 and 3: N 
% 


Norm Group, N = 143 
C-Scores 0 and 1: N 
% 

C-Scores 2 and 3: N 
% 





norms according to mean scores on the depres- 
sion factor and had only about half as many 
cases of marked depression. It may be that 
this factor is of considerable importance to 
success in many areas of nursing. Graduate 
nurses were also found to be significantly 
more stable emotionally than both student 
nurses and the norm group. Only 6% of the 
graduate nurses fell into the extreme categories 
on factor C (cycloid disposition)’ while 20% 
of the nursing students and 19% of college 
women had extreme scores. The few graduate 
nurses scoring as extreme cycloids suggests 
that emotional stability is essential to success 
in many areas of nursing. Scores on factor 
R suggest that graduate nurses are signif- 
icantly more over-controlled and conscientious 
than student nurses and norms. It seems 
likely that a certain amount of over-control 
is demanded by the nursing profession as 
student nurses also show this trend, although 
to a lesser degree. There are no marked 
differences in percentages of extreme scores 
among the groups studied in this factor. 

There were no significant differences in 
mean scores on thinking introversion-extraver- 
sion among the three groups. The graduate 
nurses had fewer scores at the extremes. 

The Personnel Inventory: This test was 
designed to single out individuals who are 


maladjusted in their work and also to extend 
the number of personality factors measurable 
with the Guilford-Martin battery. This test 
measures three factors described by Guilford 
as follows: 


“OQ—Objectivity (as opposed to personal 
reference or a tendency to take things 
personally). 

Ag—Agreeableness (as opposed to belligerence 
or a dominating disposition and an 
overreadiness to fight over trifles). 

Co—Cooperativeness (as opposed to fault- 
finding or overcriticalness of people and 
things)” (7). 


Scores on objectivity show the graduate 
nurse to be highly objective as compared with 
student nurses and norms. Table 1 shows 
student nurses to be slightly below the norms 
in this trait. Further analysis, however, 
showed college nursing students to be slightly 
above the norm and hospital students to be 
significantly lower. This low level of objectiv- 
ity of the student in the hospital school raises 
the question as to whether these schools are 
adequately training their students in scientific 
thinking. Comparison of extreme scores on 
this trait show that only 7% of graduate 
nurses score in the two low categories as 
compared to 24% of the nursing students and 
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27% of the norm group. In trait Ag, agree- 
ableness, we again find relatively few extremely 
low scores in the graduate nurse group (see 
Table 2). Student nurses from college schools 
are significantly higher than norms and hospital 
school nursing students. Graduate nurses are 
significantly above all other groups studied in 
this trait, suggesting that agreeableness might 
be of considerable importance to nursing 
success. Of all traits studied, differences in 
cooperativeness are most marked. Graduate 
nurses are significantly above all other groups 
in mean scores for this trait; college nursing 
students are close to the norm; and hospital 
nursing students are significantly below the 
other groups studied. Only 4% of the 
graduate nurses scored in the two low 
categories on this trait as compared to 23% 
of the student nurse group and 16% of the 
norm group. There findings seem to sub- 
stantiate the subjective observations made by 
faculty members and supervisors in nursing 
schools that cooperativeness is of great im- 
portance in nursing. 


Summary 


The Guilford-Martin battery of personality 
tests, measuring thirteen factors, was admin- 
istered to 182 beginning students from six 
hospital and collegiate schools pf nursing and 
78 graduate nurses. Scores of the graduate 
nurse and nursing student groups were 
compared with each other and with a norm 
group made up of 143 women college students. 
Differences between means and frequencies of 
extreme scores, suggestive of maladjustment, 
were studied. In reference to the problems 
stated at the beginning of this paper, the 
following findings can be reported: 

1. Do nursing school students or graduate 
nurses differ significantly from the norms on 
any of the personality factors measured? 

The mean scores of the graduate nurse 
group were significantly more favorable than 
those of the norm group on factors relating to 
Inferiority Feelings, Nervousness, Depression, 
Emotional Stability, Objectivity, Agreeable- 
ness, and Cooperativeness.: Their scores 
indicated that they were significantly more 
socially introverted and less rhathymic than 
the norm group. 
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Mean scores of nursing school students were 
significantly less favorable than the norm 
group in Social Extraversion, Depression, 
Social Ascendancy, and Cooperativeness. 
Nursing students in hospital schools were 
particularly low on Cooperativeness. 

2. Do graduate nurses differ significantly 
from nursing school students in the personality 
factors measured? 

In comparing the nursing student with the 
graduate nurse a number of interesting per- 
sonality differences are found. The graduate 
nurse is more stable emotionally, has more 
self-confidence and is less nervous. The 
Personnel Inventory shows graduate nurses 
to score more favorably in Objectivity, Agree- 
ableness, and Cooperativeness with all three 
differences highly significant. Nursing school 
students were significantly less over-controlled 
and showed greater general pressure for overt 
activity as compared with graduate nurses. 

3. What is the frequency and pattern of 
scores indicating personality maladjustments 
in the test profiles of nursing school students 
and graduate nurses? 


_In examining the percentages of each group 
scoring in the range considered by Guilford to 
be indicative of maladjustment, the graduate 
nurse group again showed more favorable 
adjustment than the nursing students. They 
showed markedly lower percentages of extreme 
scores in Thinking, Introversion, Depression, 
Cycloid Disposition, Inferiority Feelings, 
Nervousness, Objectivity, Agreeableness and 
Cooperativeness. The nursing students had 
fewer extreme scores in factor G (general 
pressure for overt activity). 


Conclusions 


No characteristic patterns appear in analysis 
of scores of the beginning nursing students. 
This is to be expected to some extent as the 
students were not screened to any great 
extent in most of the nursing schools studied, 
and the data were collected early in the 
semester prior to dropping out of students 
not fitted for the program. A follow-up of 
dropouts and comparison of them with success- 
ful students is now in progress and may reveal 
some differences in the two groups. 

In studying the scores of graduate nurses, 
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a pattern of traits seems to have emerged. 
The graduate nurse appears as a person with 
self-confidence and emotional stability, lacking 
nervous tenseness, cheerful and optimistic, 
agreeable, cooperative and objective. Com- 
parison of mean scores and study of extremes 
both indicate that the above factors are of 
importance. Most of these traits have appeared 
on nursing supervisor’s rating scales for years. 
In the past, however, the supervisor had no 
objective evidence that these factors were im- 
portant and no objective way of measuring the 
traits, assuming that they were important. 
The objective evidence available in this study 
suggests that a nurse can be successful in her 
profession and still be low in one or two of these 
traits (I, C, N, D, O, Ag, Co), but it is doubtful 
if a person low in a majority of these traits 
would be successful in nursing. Only three of 
the 78 graduate nurses studied had low scores 
on more than two of these traits. 

One of these subjects had extremely low 
scores in the factors Inferiority, Nervousness, 
Objectivity, Agreeableness, and Cooperative- 
ness. Reports from her supervisors substan- 
tiate the test results, describing her as 
“uncertain, under emotional stress, manifesting 
poor cooperation and an attitude of antagonism 
toward other staff members.” 

Another with low scores in factors Inferior- 
ity, Nervousness, and Objectivity was reported 
by her supervisor as being uncertain as to 
what she wanted to do in the field of nursing, 
or for that matter, whether she would remain 
in nursing at al!. She was reported also as 
having numerous physical complaints for 
which physicians have been unable to find 
any organic basis. 

The third subject, with low scores in 
Emotional Stability, Inferiority, Nervousness, 
Objectivity, Agreeableness and Cooperative- 
ness was reported by those who observed her 
as being somewhat socially withdrawn, slow 
in reacting to new situations, and indecisive in 
situations involving any change in status or 
responsibility. She was also considering leav- 
ing her position. These three cases, though 
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not conclusive, certainly lend some weight 
to the conclusion that the above-mentioned 
traits are important to success in nursing. 

Further research will be necessary before 
the above data can safely be applied to the 
screening of entering nursing school students. 
In its present form, however, it is considered 
that these data are sufficiently reliable for 
use in vocational guidance, and as an assistance 
in analysis of difficulties of nursing school 
students and graduate nurses who are failing 
to succeed. 


Received October 20, 1950. 
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The Selection of Nurses in England * 


Asenath Petrie and Muriel B. Powell 
St. George’s Hospital, London 


The Lancet Commission on Nursing (12) 
and the Working Party on the Recruitment 
and Training of Nurses (23) both asked for 
more research into selection methods for 
nurses. The Working Party (23) reported 
that in their sample, one in every three nurses 
wasted should not have been taken on for 
training. They stressed the need to choose 
people who, both in ability and temperament, 
were capable of profiting from training. 

Since at St. George’s Hospital the number of 
applications from potential nurses is such that 
seven out of every ten have to be rejected, 
the improvement of selection procedure was of 
immediate practical importance. It was 
hoped that our investigation would also throw 
some light on the personality of the good and 
poor nurse and provide information about 
individual nurses which could be used for 
counseling purposes, and thus help further to 
avoid wastage. 

Selection tests for nurses have been used for 
some time in the United States of America 
(20, 7). The methods in more general use 
consist primarily of tests of intelligence, 
aptitude, and, educational aciiievement. 
There has been some research into the use of 
objective and subjective personality tests in 
selection (8, 13, 2), but these methods, though 
promising, have not as yet replaced the more 
cognitive approach to selection which is 
based on the intellectual rather than the 
temperamental ability of the potential nurse 
(22). In England the use of intelligence tests 
or general knowledge tests in selection has been 
only sporadically reported. 

The study of previous work in this field 
suggested that personality was likely to be 


*Mrs. Petrie is psychologist and Miss Powell is 


matron at St. George’s Hospital. We express our 
thanks to the Board of Governors, and Mr. P. H. Con- 
stable, the House Governor of St. George’s Hospital, 
for their cooperation in arranging these investigations; 
to Dr. Desmond Curran, head of the Department of 
Psychiatry, for his constant help and encouragement; 
and to the sister-tutors, sisters, and nurses, without 
whose cooperation this study could not have been made. 


at least as important as intelligence in the 
good nurse, and that some aspects of intel- 
ligence might prove more important than 
others. It was therefore decided that the 
investigation should include several tests of 
different aspects of intellectual ability and that 
objective tests of personality should also be 
used, measuring both personality character- 
istics and attitudes. 

It was clearly essential that the criterion by 
which the selection methods were to be 
evaluated should be reliable and should be 
based upon those qualities essential in a good 
nurse. It was agreed that the official nursing 
examinations, though they may be an indica- 
tion of whether or not a student nurse has 
reached the minimum academic standard 
required, are not an indication of the all- 
round performance of nurses. 

As there was no other reliable indication 
available of a student nurse’s standing, which 
included her ability in the ward, the classroom, 
and as a member of the nursing staff of the 
hospital, we had to try and arrive at one. 
The method we finally adopted was based on 
rating scales. 


Outline of Investigation 


Our sample consisted of 126 nurses who ~ 
had been training at the hospital for more ~ 
than six months. 

The rating scale', which is given in Table 1, 
was devised by the authors after examining 
both English and American scales intended for — 
nursing and allied professions. 
was carried out after the nurses had been in 
training in the hospital for at least eighteen 
months. It asked for a five point rating on 
each of eighteen personality and ability 


1 Tables 1, 4, and 5 have been deposited with the 
American Documentation Institute. Order Document 
3245 from American Documentation Institute, 1719 N 
St., N.W., Washington 6, D. C., remitting $1.00 for 
microfilm (images 1 inch high on standard 15 mm. 
motion picture film) or $1.00 for photocopies (6 x 8 
inches) readable without optical aid. ° 
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traits. Each nurse was rated by three 
independent judges who knew her well; these 
were the Matron, the Ward Sister and the 
Sister Tutor. The average intercorrelation 
between these three judges was +.649. 

The Total Rating for each nurse was the 
sum of the ratings given by the three judges on 
the eighteen traits. Thus a nurse who could 
be rated as being excellent by all three judges 
on each of the eighteen traits would have a 
Total Rating of 270. The Total Rating, 
unless otherwise stated, is our criterion 
throughout. 

The ratings by the three judges for each of 
the eighteen traits were correlated with one 
another and with the Total Rating. These 
correlations are presented in Table 4. It will 
be seen that the relationship between the 
Total Rating and those on the Individual 
Traits was found to be very high (average 
r = .747). 

Each of the nurses was given both individual 
and group tests of personality and intelligence 
which are listed in Table 2. These included 
personality tests on which considerable work 
has been done recently such as word connection 
test, “plodding,” speed-accuracy preference in 
manual and clerical tasks, and manual dexter- 
ity. These have been found to relate to 
neuroticism and introversion and extroversion 
(4, 6, 10, 16, 17, 18, 19). Verbal, non-verbal 
and performance tests of intelligence were 
also included. The scores on these tests 
were intercorrelated with the Total Rating. 


Results 


In Table 2 are listed the tests used and their 
correlation with the criterion. Multiple cor- 
relations were calculated between the criterion 
and the twelve test results giving the highest 
correlation coefficients with it. In the final 
multiple correlation we included tests num- 
bered 1, 2, 3, 4, 7, 8, 9, 10, 13, 14, 16, 17 (as 
listed and numbered in Table 2). 

The size of the final multiple correlation 
was .611. Table 3 indicates how the multiple 
correlations increase in size with the addition 
of the various tests. 

All correlations are calculated by the 
Product Moment formula and have been 


2 Op. cit. 
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Table 2 


List of Tests and Their Correlation with the Criterion 





Tests 





. Accurate clerical observation. Minnesota 
Clerical Test. I.f Number of mistakes 
in comparison of figures. 

. Accurate clerical observation. Minnesota 
Clerical Test. II. Number of mistakes 
in comparison of names. 

. Manual dexterity. Average score on 
O’Connor Tweezer Test. 

. Persistence at a task. Productivity in 
word building test. 

. Persistence at a task. Time spent on 
word building test. 

. Kent Shakow Performance Intelligence 
Test. Intelligence score. 

- Kent Shakow Performance Intelligence 
Test. Number of mistaken moves. 

. Non-verbal Intelligence. Penrose Pat- 
tern Perception Test (14). 

9. Crown Word Connection Test. 
rotic”’ score. 

10. Adaptation of Interest Test Pressey X-O. 

11. Adaptation of Annoyance Test Pressey 
X-O. 

12. Verbal Intelligence. 
lary Test (21). 

13. Speed Accuracy Preference. Number of 
mistakes in trial on track: tracer when 
accuracy is stressed. 

14. Concentration Test, involving figures. 

15. Distractibility Test. Number of mis- 
taken answers. 

16. Strength of maximum grip on the Dyna- 
mometer. 

17. Educational Status. 


“Neu- 


Mill Hill Vocabu- 


—.175* 
152 


—.121 





* Starred correlations are significant at the 5 per cent 
level. Most of the others are in the predicted direction 
and would be significant if the one tail test were applied. 

¢ Permission to use this was kindly given by the 
Psychological Corporation, New York 18, New York. 


corrected for attenuation in the criterion by 
Dunlap and Cureton’s Formula (5). The 
multiple correlations reported are certainly 
higher on this group than they would be on a 
new one, because of the well-known tendency 
for the multiple correlation formula to sum- 
mate errors. 

The time taken to give the tests is 1 hour 
and 41 minutes, of which 56 minutes is spent 
in Group Testing. If the period available for 
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selection procedures is very limited, it is 
possible, by omitting tests 4 and 7, to obtain 
a relatively high correlation in 1 hour and 6 
minutes, of which 56 minutes is spent in 
Group Testing. 

As some of these tests are not widely known, 
a very brief description will be given. The 
Track Tracer is a machine designed in the 
Cambridge Laboratory and kindly loaned by 
Professor Bartlett. It consists of an ivorene 
sheet attached to a metal base. - The ivorene 
sheet is perforated by pairs of holes which are 
irregularly spaced, but which define a curved 
path which leads from the outside to the 
center of the board. The student’s task is 
to trace the path between these pairs of holes 
to the center of the board. If she goes into a 
hole, electrical contact is made through the 
metal sheet with an electric counter and an 
electric buzzer. We thus have a measure of 


both the speed and accuracy with which the 
student carries out such trials. The student 
is asked on the first trial to do the task as 
quickly and as accurately as she can, both 
words being equally stressed. On the second 
trial she is asked to do it as quickly as she can 


and on the third trial as accurately as she can. 
The number of mistakes made in the last 
trial, in which accuracy is stressed, is the 


Table 3 


The Multiple Correlations with the Criterion Achieved 
by Successively Adding the Individual Tests 











. Maximum grip on dynamometer 

. Manual dexterity—O’Connor tweezer test 

. Number of mistakes on Track Tracer 

. Concentration test 

. Plodding test—Productivity in word building 

. Kent-Shakow form boards—Number of mis- 
taken moves 

. Word Association Test 

. Number of Interests in Adaptation of Pressey 
x/O 

. Minnesota Clerical Test—Number Compari- 
son 

. Minnesota Clerical Test—Name Compari- 
son 

. Penrose Pattern perception—Non-verbal in- 
telligence (14) 

. Educational Status 
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score which shows the highest correlation 
with the criterion. 

Concentration is measured by the ability 
to repeat groups of figures from a series which 
is interrupted at irregular intervals. The 
student is asked to try and remember the 
last six numbers which are read out, or as 
many as she can, in the order they are given. 
Eight series were given and the score was the 
total number of figures correctly remembered. 

The test of “plodding” used was a word 
building test in which the maximum number of 
words had to be made out of a nine letter 
word. The score was the number of words 
produced. 

Crown’s Word Connection List (4) has been 
developed as a group Word Association Test. 
It allows for a choice reaction between associa- 
tions which are known to be commonly 
found among neurotic patients and those 
which are found among normal individuals. 


Factorial Analysis 


In order to understand better the grouping 
of traits and the personality requirements 
involved in the rating scale, a Factorial 

alysis of the traits was carried out. Thur- 
stone’s Centroid Method was used and 
produced a General Factor, accounting for 
55 per cent of the variance, and a Bi-Polar 
Factor accounting for 12 per cent of the 
variance. In Tables 4 and 5° are given the 
intercorrelations of the ratings and their 
Factor saturations. The numbers of each 
rating are as given in Table 1. 

The traits in the rating scale most highly 
saturated with the General Factor were 
numbers 4, 3, 8 (e.g., general finish and 
smoothness of performance of clinical work; 
skill; accuracy and speed and attention to 
detail; quietness). It therefore seems appro- 
priate to regard the first factor as being one of 
“general nursing efficiency.” 

The Bi-Polar Factor showed four traits in 
the rating scale with high positive saturations 
and four traits with high negative saturations. 
The traits with high positive saturations were 
numbers 2, 6, 1, 13 (e.g., knowledge of under- 
lying principles of nursing practice and 
nursing skills). These traits are clearly char- 


2 Op. cit. 
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acterized by being related to intellectual 
capacity and learning. 

The traits in the rating scale with high 
negative saturations were 9, 10, 11, 12 (e.g., 
satisfactory relationship with the patients; 
ability to gain their co-operation; patience; 
understanding; kindness; sympathy). These 
traits can be. characterized as aspects of 
personality related to dealing with other 
human beings and social attitudes in general; 
aspects, it may be, of what is generally meant 
by character. 

There are thus two very different demands 
made of a good nurse. One is that she has 
enough mental ability to cope with her work, 
the other that she has the kind of temperament 
that allows her to make the maximum use of 
her ability in this profession. 

These two sets of rating items which have 
been listed above as showing the highest 
positive and negative saturation with the 
Bi-Polar Factor were summed for each 
nurse. The two totals, representing respec- 
tively intellectual and social relationship traits, 
were correlated with the scores on each of the 
tests. The more interesting correlations are 
It was found that some of 
the tests were more closely related to the 
“ability” traits, others to the “personal rela- 
tionship” traits. For example, both our non- 
verbal and verbal tests of intelligence were 
closely related to the ratings on “ability,” but 
unrelated to the ratings on “personal relation- 
ship.” The score on a test of neuroticism, a 
group Word Connection test, is more highly 
correlated with the “human relationship” 
traits than with the “ability” traits. On the 
O’Connor Tweezer Test of manual dexterity, 
in which neurotics are known to have a poorer 
performance than the rest of the population, 
the scores appear to be related to both the 
“ability” and “human relationship” aspect, 
and so on. It is also of interest that the 
marks on the practical examination given in 
this hospital prior to the Preliminary State 
Examination are related to both sets of traits; 
the theoretical examination marks are chiefly 
related to the “ability” traits. 


given in Table 4. 


Examinations 


Much has been written about the desirability 
of changes in nursing education and in the 
system of examinations (1, 3, 23). It is asa 
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Table 6 


Correlations of “Ability” and “Personal Relationships” 
Traits with Some of the Tests Used 





Personal 
Relation- 
ship Traits 
(Traits 9, 
10, 11, 12) 


Ability 
Traits 
(Traits 1, 


Tests 2, 6, 13) 





O’Connor Tweezer. Average 
number on two trials 

Word Building. Number of 
words produced 

Kent-Shakow Form Boards. 
Number of mistaken moves 

Penrose Pattern Perception 
Test. Number correct 

Crown Word Connection Test. 
Neurotic Score 

Educational Status 

Mill Hill Vocabulary Test 

Nursing Examination—Theo- 
retical 

Nursing Examination—Prac- 
tical 


+.286 +.207 


+.302 +.136 


— .306 —.132 


+.270 +.042 
—.160 
+.369 
+.269 


—.219 
+.131 
— .086 


+.482 +.152 


+.358 + .363 





slight contribution to this discussion that we 
mention a finding in connection with the 
theoretical and practical examinations. The 
marks of the nurses on these examinations 
were correlated with the Total Rating of the 
three independent judges. The correlation 
was .404 with the theoretical examination 
and .377 with the practical examination. 
Moreover, the group of traits which are 
related to personal relationships, when corre- 
lated with the theoretical examination, gave 
a co-efficient of .152. Thus, the theoretical 
examination, insofar as it is at all related to the 
selection of the good nurse, is related almost 
entirely to her intellectual qualities. There is 
a slightly happier finding with regard to the 
practical examination, where the marks are 
equally related both to the group of “ability” 
traits and the group of “personal relation- 
ship” traits. The extent of the relationship, 
however, is represented by a correlation co-effi- 
cient of only .36. 

The examination marks were correlated with 
the Verbal and Non-Verbal Tests of Intel- 
ligence and with the number of mistakes made 
on the performance test‘ in our battery. The 


4 We had decided to note the number of false moves 
made in this test as it has been shown with medical 
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Table 7 


The Correlation Coefficients Between the Intelligence 
Tests and the Examinations and 
the Total Rating 





Tests 


Verbal Non-Verbal 
Intelligence Intelligence 
(Mill Hill 
Vocabu- 
lary) 





Intelligence 
(Penrose Test 
Pattern (Kent-Shakow 
Perception) Form Boards) 


Examina- 
tion 





Theoretical 
Exam. 

Practical 
Exam. 

Total Rating 


336 391 —.171 


075 
.128 


—.211 
—.240 





sizes of these correlations and of the corre- 
lations between the Total Rating and these 
tests are given in Table 7. 

It will be noted that the Non-Verbal Test 
of Intelligence (4) is related to success at 
both examinations and the Total Rating. 
The other test scores are related primarily to 
only one of the examinations. The relation- 
ship is not very close and suggests that, even 
if we were choosing nurses solely with a view 
to their being able to pass examinations, we 
would nevertheless be ill advised to rely on 
purely intellectual measures. 


Age and Educational Level 


It i: also of interest that there was a tendency 
for the older nurse to be the better nurse 


(r = .248). The age range of nurses com- 
mencing training is 18} to 30, and it may be 
that training as at present designed, is more 
suitable for older students. There is also a 
tendency in our group for the individual 
who has fewer brothers and sisters to be the 
better nurse. . 

The educational status of the nurses was 
classified on a 5 point scale, the lowest category 
containing those whose education had ceased 
at the age of fourteen, while the highest 
category contained those who had proceeded 


students (15) and University students (9) that a tend- 
ency to make a relatively large number of mistakes is 
negatively associated with success and positively asso- 
ciated with other measures of neuroticism. The number 
of false moves in the Performance Test was found to be 
negatively related to our criterion and to be positively 
related to our measures of neuroticism. 
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beyond Higher Schools. It was found that 
the degree of education was related in our 
group to the success of the nurses (r = .31). 


Discussion 

This selection procedure is not intended to 
replace the interview, study of school records, 
and all that has been done hitherto to maintain 
the high standard of the potential nurse. 
But we suggest that selection tests be added 
to the earlier methods because they provide 
additional information. The assessment of the 
nurses will be repeated each year, so that their 
standing at different stages in the training 
can be related to the scores on these personality 
tests. 

As the Working Party (23) states, “Wastage 
comes, not only from asking too much of a 
dull girl... It arises also from lack of 
appreciation of the gifts of brighter students.” 
These tests give considerable information 
about the ability and temperament of the 
individual nurse. It is hoped that those in 
charge of the nurses during their training will 
find this information useful in helping nurses 
with their difficulties. 

‘The Working Party Report (23) also suggests 
that there is a strong indication that neurot- 
icism makes for an unsuitable temperament 
for nursing, and that one third of the wastage 
is due to lack of selection on grounds of 
temperament. Many of the tests contributing 
to our multiple correlation, such as word 
connection (4), persistence and manual dexter- 
ity (6), are highly related to neuroticism, and 
the maximum grasp of the dynamometer has 
been shown to be thus related in children 
(11). 

Of the three types of intelligence tests we 
have used (non-verbal, performance and verbal) 
only the non-verbal test has a high correlation 
with our criterion. Moreover, scores on some 
of the personality tests were more highly 
related to our criterion than was non-verbal 
intelligence: for example, the tendency to 
observe accurately; productivity in a Word 
Building Persistence Test, and the tendency 
to avoid mistakes in a manual task. This 
emphasizes the importance of not placing too 
much stress on Intelligence Tests alone in 
selecting nurses. 

The Lancet Commission (12) stated that 
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“there is evidence that shortage (of nurses) 
is not only quantitative but qualitative.” 
We feel it is of prime importance to improve 
qualitative selection by the use of selection 
tests such as have been described above. 
This is necessary because, with an improve- 
ment in the quality of the student nurses, it is 
likely that the whole standard of nursing will 
be improved, both directly by selecting nurses 
who can be of greatest service to the patient, 
and indirectly by increasing the attractiveness 
of the profession to other able human beings. 


Summary 


1. One hundred and twenty-six nurses at 
St. George’s Hospital who have been in 
training for over six months were examined 
on a wide and varied group of personality 
and intelligence tests. 

2. Ratings by three independent judges on 
eighteen personality traits were used as the 
criterion. The ratings were made after the 
nurses had been in training for eighteen 
months. 

3. Twelve of the tests were found to give 
a multiple correlation of .611. These tests 
included measures of neuroticism, accuracy of 
observation, concentration, tendency to make 
manual mistakes, persistence, number of 
interests, and non-verbal intelligence. 

4. A Factorial Analysis of the rating 
scale suggests that, in addition to general 
nursing ability, there are two distinct require- 
ments made of the good nurses; one involves 
intellectual capacity, the other involves per- 
sonal relationships. Some of our measures 
are shown to be more highly related to one 
set of traits than to the other. 

5. It was found that examination results 
were not closely related to the Total Rating 
of nursing ability, nor to the three measures 
of general intelligence. 


Received July 26, 1950. 
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Trends in the Use of Certain Attention-Getting Devices in 
Newsweekly Advertising * 


Kendall I. Trenchard 
Fordham University 


and 
William J. E. Crissy 
Fordham University and Queens College 


Attracting the reader’s attention to a printed 
advertisement is an antecedent and funda- 
mental step to the ultimate goal of making a 
sale. Many devices and techniques have been 
used to capture attention in printed media. 
Among those more frequently used are: 
size, position on the page, color, and illustra- 
tion. These and other attention-getters have 
been studied experimentally using various 
criteria for gauging their relative worth. In 
general, these are the usual research findings 
on the variables studied: the larger the 
advertisement, the more attention compelling; 
upper position on page is better than lower 
position, and upper left side is better than 
upper right side; color is better than black and 
white only, and 4-color advertisements cause 
a sharp rise in attention-getting; finally, and 
perhaps most obviously, the use of illustrations 
is an aid in getting attention. 

It is proposed in this paper to report merely 
the frequency of pre-war and post-war trends 
in the use of these listed factors in advertise- 
ments appearing in Time and Newsweek. 


Methodology 


Issues of the two magazines from two 5-year 
periods were studied—1936-1940 and 1945- 
1949. For each year, ten issues of each 
newsweekly were randomly selected. From 
each selected issue ten pages were randomly 
drawn. Those advertisements appearing on 
the selected pages comprised the samples 
studied. 


* This paper is based on one aspect of Trenchard’s 
M.A. dissertation entitled: A Longitudinal Study of the 
Readability of Text and Advertising Copy, Including the 
Recent Trends in Important Physical Characteristics. 
Fordham University, 1951. 
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Size was categorized as }-, 3-, or full-page 
Position on the page was left vertical 3, 
left vertical 3, right vertical 4, right vertical 3, 
both left and right vertical 4. Color was 
categorized as black and white or one, two, 
three, or four or more colors. Illustration 
was tabulated as present or absent. 

Mean values were computed for each factor, 
by each magazine separately, and by pre-war 
and post-war periods. These in turn were 
converted to percentages to facilitate inter- 
pretation. 


Results 


In Table 1 are presented the percentages of 
various size advertisements found in Time and 
Newsweek in the two periods studied. The 
only significant increase is in the percentage 
of full-page advertisements carried by Time. 
It does not seem unreasonable to suppose that 
this may account for the corresponding 
Cecrease in the percentage of 3-page advertise- 
ments in the same magazine. In Newsweek 
there is a significant decrease in the percentage 
of 4-page advertisements. The trends as they 
exist are in the direction of increased size. 


Table 1 


Pre-War and Post-War Differences in 
Size of Advertisements 





Post-War 


Pre-War 





Time Newsweek Time Newsweek 
0 % % % 





40.8 53.6 59.2* 61.3 
40.8 24.0 25.9 22.4 
18.4 22.4 14.9 16.3f 





* Significant increase. 
t Significant decrease. 
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Table 2 


Pre-War and Post-War Differences in the Position of 
Advertisements on the Page 





Pre-War Post-War 





News- News- 

Time week Time week 

% % % % 
Left vertical 1g 12.5 11.2 13.3 
Left vertical 24 16.8 . 15.5 8.3* 
Right vertical 44 5.9 ; 3.6 2.9 
Right vertical 24 15.5 6.9f 12.5 
Both left and 

right 4 8.5 3.5 0.0 
Full 40.8 59.2* 61.3 


Position 





advertisements in Newsweek. Time shows a 
significant decrease in the percentage of right 
vertical 3-page advertisements. 

An inspection of Table 3 reveals these 
significant trends in both magazines: toward 
less black and white only; an increase in the 
use of both single color advertisements and 
multi-color advertisements. Again, the trends 
are in the direction of increased use of color 
as a device for attraction of attention. 


Table 4 


Pre-War and Post-War Differences in the Use of 
Illustrations in Advertisements 





* Significant increase. 
t Significant decrease. 


Position of advertisements on the page is 
summarized in Table 2. Percentages of full- 
page advertisements are repeated from Table 
1 to facilitate comparison of results to a 
total 100%. Here the only significant increase 


is in the percentage of left vertical 3-page 


Table 3 


Pre-War and Post-War Differences in the Use of 
Color in Advertisements 





Pre-War Post-War 





= News- News- 
Time _— week Time week 
Q % % % 
Black and white 89.9 63.5¢ 58.1T 
One 3.2 
Two 0.8 
Three 0.0 
Four or more 6.1 


No. of Colors 








* Significant increase. 
t Significant decrease. 


Pre-War Post-War 





Time Newsweek Time Newsweek 
0 0 % % 





Present 97.6 94.9 98.4 98.9* 
Absent 2.4 5.1 1.6 1.1f 





* Significant increase. 
t Significant decrease. 


While illustrations were almost invariably 
used in advertisements in these magazines 
in the pre-war years, it is worth noting in 
Table 4 that the significant trend is to an 
even increased use of illustrations in Newsweek 
and this should make for increased attraction 
of attention. 


Summary 
The significant tyends found in advertise- 
rents in Time and Newsweek with respect to 
size, position on page, color, and illustration 
are toward increased use of these devices for 
attraction of attention. 


Received June 14, 1951. 
Early publication. 





Speed of Manipulative Performance as a Function of 
Work-Surface Height * 


Douglas S. Ellis 
Towa State College 


One of the primary tasks of the industrial 
psychologist is to obtain optimal relationships 
between the worker and the machines that 
he must operate. Three main approaches 
have been used in dealing with this problem: 
(1) selecting men to fit machine characteristics; 
(2) modifying men to fit machine character- 
istics; (3) designing machines to fit the 
characteristics of the available men. While 
selection and training have long been the 
basic techniques of the industrial psychologist, 
recent psychological research on equipment 
design (4) indicates that this approach can 
make valuable contributions toward increasing 
the effectiveness of the man-machine unit. 
The present problem may be considered as an 
investigation in the general area of psycho- 
logical research on equipment design. Syecif- 
ically, the investigation is concerned with 
variations in the speed of performing a simple 
manipulative task as a function of work- 
surface height. 

Although considerable work has been done 
on various design variables, the work-surface 
height variable apparently remains unexplored. 
Barnes notes the variable in his discussion of 
design variables, and states (1, p. 272), “With 
40 inches taken as the average elbow height of 
the female workers (the range being from 34 
to 45 inches) and with the hand allowed to 
work 1 to 3 inches lower than the elbow, the 
average height of the working surface should 
be 37 to 39 inches.” No _ experimental 
evidence is cited to support the conclusion 
that the hand should be one to three inches 
lower than the elbow. 


* This paper is a portion of a dissertation submitted 
to the Department of Psychology of Northwestern 
University in partial fulfillment of the requirements for 
the Ph.D. degree. The writer is indebted to Professor 
R. W. Kleemeier for assistance during the preliminary 
phases of the problem, and to Professor B. J. Underwood 
for his many helpful suggestions during the completion 
of the research. 


Methodology 


Subjects. Ss were 48 volunteer male under- 
graduate students from sections of the intro- 
ductory psychology course at Northwestern 
University. The experiment was conducted 
during the fall of 1949. 

Apparatus. Work-surface height was con- 
trolled by an adjustable work-surface. This 
apparatus consisted of a plywood working- 
surface attached to two collars which rode on 
the vertical pieces of a framework of 1} in. 
pipe. A fluorescent lamp, mounted so the 
light source was 30 in. above the work-surface, 
furnished illumination. The task, a modifica- 
tion of the Minnesota Rate of Manipulation 
Test (MRMT), was mounted on the work- 
surface. 

Measurement of Work-Surface Height. The 
work-surface heights used in this experiment 
were determined for each S according to the 
equations presented in Table 1. This pro- 
cedure permits the indexing of work-surface 
height relative to the bodily dimensions of the 
individual. Table 1 also includes the average 
distance of the work surface above the floor 
and the average distance of the work surface 
from the elbow corresponding to the six heights 
employed. Expressing work-surface height in 
terms of distances above or below the elbow 
would be preferred for subsequent investiga- 
tions, since it has satisfactory accuracy and is 
simpler than the more complicated expression 
used in the present experiment (2, pp. 3-6). 
In the interests of simplicity we shall denote 
the six work-surface heights used in the experi- 
ment as Height 1, Height 2, etc., in conformity 
with the usage of Table 1. 

The Block-Turning Task. The task used 
was a modification of the turning portion of 
the MRMT. The test was adapted to the 
purposes of the experiment by changing it from 
a rectangular board with four rows of 15 blocks 
to a square board containing eight rows of 
eight blocks. However, the dimensions of the 
holes, blocks, and the center distance between 
holes was not altered.!. This modification per- 
mitted restriction of S’s foot movements to a 


1 The dimensions were as follows: outside hole diam- 
eter, 1.56 in.; block diameter, 1.43 in.; distance between 
hole centers, 2.10 in. This center distance was the 
same between all holes. The blocks were painted red 
on one side and black on the other. 
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Table 1 
Indices of Work-Surface Height* 











Average 
Distance 
from 
Elbow 


— 18.9” 
—13.5 


Average 
Distance 

from 

Floor o 


239” Lar” 
31.3 1.42 

36.6 
42.0 
47.4 
52.7 


Height Equation 


HH—.10 AL 
HH-+.10 AL 
HH+.30 AL 
HH+.50 AL 
HH+.70 AL 
HH+.90 AL 





1.61 — 8.2 
1.81 — 28 


a 
oo 


1.97 
2.25 


2.6 
7.9 





* The following abbreviations are used in this table: 
HH for hand height (distance from finger-tip to floor 
with arm and hand perpendicular to floor); AL for 
arm length. 


small area. S was required to stand on a 
rectangular box 16 X 16 X 1} in. while work- 
ing on the task. 

Although a modification of the MRMT was 
used in the present experiment, validation data 
obtained on the MRMT may be viewed as 
estimates of the characteristics of the modified 
task. Studies by Tuckman (7) and Jurgensen 
(6) indicate that the block-turning portion of 
the MRMT has satisfactory internal con- 
sistency. Both authors report corrected odd- 
even reliability coefficients which are greater 
than 0.90. These coefficients are based on 
more than 200 cases. Test-retest reliability 
has also been investigated by Tuckman, using 
100 high school students tested a median of 
seven days apart. Under these conditions a 
reliability coefficient-of 6-81 was obtained. In 
addition, Jurgensen reports a correlation of 0.46 
between performance on the turning task of 
the MRMT and supervisor’s ratings of the 
performance of 60 converting machine oper- 
ators in a paper mill. The available eyidence 
indicates that the turning task of the MRMT 
has *satisfactory reliability, and has at least 
some features in common with simple manual 
industrial operations. 

Work Methods. S’s task was to turn the 
blocks over in the holes, using both hands. 
The task was performed by rows, beginning 
by working to the right on the outermost row, 
and then systematically altering the direction 
of motion from left to right on each succeeding 
row. When S reached the innermost row, he 
immediately returned to the block at the left 
end of the outermost row and repeated the 
procedure. In turning over the blocks, S was 
instructed to pick up the block with the leading 
hand and place it in the hole, bottom side up, 
with the following hand. At all times S was 
to pick the block out of the hole and was to 
avoid fumbling the blocks over in the holes. 


Failure to get a block turned over or placed 
squarely in the hole was dealt with in the 
following manner: S was allowed to correct 
any errors as long as he was working in the 
row in which the error was made. However, 
once he had started another row he was to 
ignore any such errors and keep on working.? 

Experimental Design. A simple Latin square 
design was used which provided for the coun- 
terbalancing of practice against work-surface 
height. With 48 Ss and six degrees of practice 
and work-surface height, such a design con- 
sisted of eight different six-by-six Latin squares. 
All squares were constructed according to the 
procedure recommended by Fisher and Yates 
(3), which provides for the random determina- 
tion of the sequences within a square as well as 
the random assignment of Ss to sequences. 

The essential feature of the design is that it 
permits each S to serve under all experimental 
conditions, e.g., to work at each of the six 
work-surface heights. The design is particu- 
larly powerful when coupled with analysis of 
variance, which furnishes estimates of the vari- 
ance attributable to subjects, practice, and 
height. These estimated variances may be 
deducted from the total variance to yield an 
error estimate of high precision. Although 
each S served in two experimental sessions (see 
below), the same Latin square was used on 
both days. 

Experimental Procedure. Each S served in 
two 45-minute experimental sessions which 
were 48 hours apart. Both sessions were de- 
voted primarily to six three-minute trials at 
each of the six work-surface heights. These 
trials were separated by two-minute rest peri- 
ods. Ss were given knowledge of results at 
the completion of each trial, and were fully 
informed as to the purpose of the experiment. 

At the start of each session, S was given a 
three-minute practice trial during which any 
marked deviations from standard work meth- 
ods were corrected. Following completion of 
this practice trial, S performed the six experi- 
mental trials. Before each of these trials S 
was requested to work as fast as possible. 
Immediately after completing any trial, S was 
given two rating scales and a check-list to fill 
out.* . Knowledge of results was given imme- 
diately after these materials were completed. 
During the remainder of the two-minute rest 
interval S was free to relax in a comfortable 
chair while E turned over the remaining blocks 


* Records were kept of such uncorrected errors for 
24 Ss, The average number of uncorrected errors per 
S per trial was 0.25. One may conclude that differ- 
ences in uncorrected errors were not a major factor in 
the experimental results. 

3 The rating scales were seven-point vertical scales 
designed to measure feelings of tiredness and muscular 
strain. The check-list was constructed so that S could 
indicate the locus of any muscular strain that was felt. 
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and adjusted the work-surface to the appro- 
priate height for the subsequent trial. 

The procedure was similar during the second 
experimental session. At the conclusion of the 
session, S adjusted the work-surface to the 
height at which he believed it would be most 
comfortable to work. 


Results 


Height-Performance Functions. The per- 
formance means at the six work-surface 
heights are plotted in Figure 1. In the case of 
the curves for Day I and Day II, each point 
is based on the average performance of 48 Ss 
for a three-minute trial. The pooled curve 
is merely an average of the curves for Day I 
and Day II. Inspection of Figure 1 indicates 
that the height-performance function is smooth 
but asymmetrical, with optimal performance 
occurring in the region of Height 4. The 
curves for Day I and Day II are of similar 
shape, although the Day II curve indicates a 
considerably higher performance level. 

Figure 2 presents the practice curves for 
Day I and Day II. As is the case with the 
height-performance functions, each point is 
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based on the average performance of 48 Ss for 
a three-minute trial. It can be seen that, 
although the curve is slightly negatively 
accelerated, learning is occurring at a rapid 
rate. 

In analyzing the statistical significance of 
the data it is worthwhile to first consider the 
possibility of combining the data by days. 
This possibility appears appropriate because 
of the similar shape of the height-performance 
curves of Day I and Day II. Such a pro- 
cedure, if justifiable, would serve two func- 
tions: (1) it would summarize the data for 
the two days in one treatment; (2) it would 
permit the use of greater degrees of freedom in 
evaluating the significance of the major 
effects. The justifiability of combining the 
data by days and treating them by analysis 
of variance depends upon the fulfillment of 
three conditions: (1) the obtained. frequency 
distributions, combined by days, must not 
depart significantly from normality; (2) the 
variances associated with the major variables 
must be homogeneous; (3) the shape of the 
height-performance function must not be 
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Fic. 2. Practice curve on block-turning task. 


differentially affected by days. Since exam- 


ination of the data indicated that they did 


indeed satisfy these three conditions,‘ it may 
be concluded that analysis of variance of the 
combined data is an appropriate statistical 
procedure. 

Table 2 presents the results of this analysis 
of the combined data. It can be seen that 
the effects of height, practice, individual 
differences among subjects, and days are all 
significant well beyond the 1% level of 
confidence. ‘ 

The effects of work-surface height may be 
subjected to more detailed analysis by the 
i test. Essentially, this procedure involves 
comparing each point of the pooled height- 
performance curve of Figure 1 with every 
other point. The resultant values of ¢ are 
arrayed in Table 3. It can be seen that the 
majority of inter-height comparisons are 
statistically significant. Of the 15 possible 
comparisons, two are not significant, two are 
significant at the 5% level of confidence, and 

*A detailed description of the statistical operations 


involved is presented in the author’s thesis (2, pp. 
13-16, 18-20). 


the remaining 11 are significant at the 1% 
level of confidence. These data seem to 
indicate clearly that the shape of the height- 
performance function approximates the pooled 
curve presented in Figure 1. 

Feelings of Muscular Strain. Results ob- 
tained from the rating scale designed to meas- 
ure feelings of muscular strain are plotted in 
Figure 3. The curve represents pooled results 


; Table 2 
Analysis of Variance of Pooled Data 





Source Degrees 
of of 
Varia- Free- 
tion dom 
Heights 5 
Trials 5 
Subjects 47 
Days 1 
Error 517 


Total 575 


Mean 
Square 
5,042 
9,964 
7,462 
136,860 
106 


Sum of 
Squares 
25,210 
49,820 
350,710 
136,860 
54,855 


617,455 











* Denotes F significant at the 1% level of confidence. 
The error estimate is the appropriate denominator for 
all F tests in this table. 
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Table 3 


Comparisons of Mean Block-Turning Performance 
at the Six Work-Surface Heights by /. 
Data Pooled for Day I and Day II 





Height 1 2 3 4 5 6 


5.00** 4.56** 
0.44 

2.58** 
2.83** 





7.14** 6.79** 
2.14° 239° 
0.26 


5.19** 
10.19** 
12.33** 
12.58** 

9.75** 





* Denotes / significant at the 5% level of confidence. 
Since the és are based on the error estimate of Table 2, 
they are evaluated with the associated 517 degrees of 
freedom. ¢ = 1.97 at the 5% level of confidence. 

** Denotes ¢ significant at the 1% level of confidence. 
For 517 degrees of freedom, 1% ¢ = 2.57. 


for Days I and II, since the individual day 
curves were highly similar. It can be seen 
that the function relating muscular strain to 
work-surface height is asymmetrical, with 
minimum strain occurring at Height 4. 
Evidently, high work-surface heights are 


= 


conducive to maximum strain, although strain 
also increases markedly at low heights. 
Comparison of this curve with the height- 
performance functions of Figure 1 reveals an 
inverse relationship between performance and 
feelings of muscular strain. Minimal muscular 
strain occurs at Height 4, which is also the 
point of maximal performance. 

Results from the rating scale used to measure 
feelings of tiredness exhibited the same general 
trend as the muscular strain data. However, 
the curve was considerably flatter than that 
of Figure 3. 

Locus of Muscular Strain. Table 4 sum- 
marizes the results obtained from the check- 
list used to measure the locus of feelings of 
muscular strain. The results were pooled for 
days and adjacent heights since these variables 
did not appear to seriously alter the general 
form of the data. It can be seen that involve- 
ment of back and leg muscles is maximal at 
the lower heights, while the higher heights 
maximize involvement of the upper arm, 
shoulder, and to a lesser extent, the forearm. 
The obtained chi-square tests the hypothesis 
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Table 4 
Variation of Locus of Muscular Strain with Work-Surface Height. Data Combined for Day I and Day II* 








Locus of Strain 





Upper 


Heights 


1 and 2 Y ete 
3 and 4 39 
5 and 6 108 


Arm 








Shoulder Other 
45 26 46 
34 26 37 
37 45 


Forearm 








Total Nee 185 
x2 = 135.93 





89 
1% x? = 23.21 








* Tabled values represent the total number of times muscular strain was localized in that portion of the body. 


that all samples were drawn from the same 
population. Its significance, far beyond the 
1% level of confidence, permits rejection of 
the hypothesis and the consequent inference 
that the locus of muscular strain varies with 
work-surface height. 

Preferred Work-Surface Height. It will be 
recalled that, at the conclusion of the experi- 
mental session of Day II, S was requested to 
adjust the work surface to the height at 
which he believed it would be most comfortable 
to work. An average preferred work-surface 
height of 41.3 in. with an associated standard 
deviation of 2.51 in. was obtained from this 
procedure. This value is in close agreement 
with the point of optimal performance of the 
height-performance functions of Figure 1, 
where maximum performance occurs at Height 
4. The mean work-surface height at Height 
4 is 42.0 in5 , 


5 Within the framework of the present experiment, 
data were also obtained relating speed of manipulative 
performance to row distance, defined in terms of the 
distance of any row of blocks from the edge of the 
work-surface nearest to S. These row distance-per- 
formance functions were based on measures of the time 
taken by S to complete selected rows of blocks. The 
measures were procured during the three-minute trials 
which were the basic work unit of the major experi- 
mental design. 

Analysis of variance of these data indicated that 
block-turning performance varied significantly with row 
distance. aximum speed occurred at intermediate 
row distances, while far row distances were conducive 
to faster performance than close row distances. Fur- 
ther analysis revealed a significant interaction between 
work-surface height and row distance. It was found 
that, as row distance increases, the maxima of the 
height-performance function tends to shift towards 
lower work-surface heights. Optimal performance 
occurred at Height 3 with an intermediate row distance 
of 11.35 in. 


Discussion 


The obtained data relating manipulative 
performance, feelings and locus of muscular 
strain, and subjective judgments of preference 
to work-surface height are of direct interest 
for problems of equipment design. The data 
are easily summarized since they are highly 
consistent in pointing to Height 4 as the 
optimal work-surface height. Maximal per- 
formance, minimal feelings of muscular strain, 
and Ss’ judgments of the preferred work- 
surface height all occur in the neighborhood of 
Height 4. The average work-surface height 
at Height 4 is 42.0 in., which corresponds to an 
average distance of approximately 3 in. 
below the elbow. 

Practical Significance of Results. The ap- 
plicability of these results to problems of 
industrial equipment design depends upon 
the cegree of similarity between the experi- 
mental and industrial situation. It should be 
emphasized that the experimental results 
were obtained on college students performing 
a highly simplified task under conditions 
which permitted regular alternation of three- 
minute work periods with two-minute rest 
periods. Obviously, these conditions are not 
typical of those found in industry, and the 
generalization of the results to the industrial 
situation is proportionately hazardous. When 
it is further considered that under these 
conditions performance at the poorest work- 
surface height was only six per cent slower 
than performance at the best height, it is 
apparent that the results should not be 


‘viewed as a justification for the installation 
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of adjustable work-surfaces in industrial situa- 
tions. 

However, the obtained results do have 
direct practical implications in at least two 
areas. First, they point to work-surface 
height as an equipment design variable which 
does have a consistent effect on manipulative 
behavior and which therefore deserves further 
investigation. Second, the data are relevant 
to the design of psychomotor tests. Since 
psychomotor test trials are usually less than 
three minutes in length, and since with three- 
minute trials only a six per cent performance 
differential was found between extreme heights, 
one may conclude that such tests are probably 
not seriously influenced by their characteristic 
neglect of the work-surface height variable. 

Interpretation of Results. The problem of 
interpretation lies in specifying the variable 
intervening between work-surface height and 
performance. One possible approach would 
be to postulate muscular tension as this 
variable, since the experimental data indicate 
that variations in work-surface height are 
accompanied by changes in both the intensity 
and locus of muscular strain. Such theorizing 
is not only consistent with the experimental 
data, but also leads to experimentally testable 
hypotheses. For example, the locus-of- 
muscular-strain data presented in Table 4 
suggest the hypothesis that maximum perform- 
ance decrement occurs at the design variable 
setting which localizes strain in body members 
most directly involvec in tas\ performance. 
Examination of Table 4 indicates that at 
high heights, where maximal performance 
decrement occurs, muscular strain is localized 
in the shoulder, upper arm, and forearm. 
These body members are of direct importance 
in the performance of the block-turning 
task. Experimental verification of the 
hypothesis could take the form of systemat- 
ically varying the locus of induced muscular 
tension during the performance of a manipula- 
tive task, and noting the attendant variations 
in performance. 

Although it appears reasonable to postulate 
muscular tension as the process mediating 
between work-surface height and performance, 
a serious deficiency of such theorizing is that 
it fails to specify the manner in which muscular 
tension exerts an influence on performance. 
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Thus, while the postulated intervening variable 
is related to a manipulable stimulus variable 
(work-surface height) it is not anchored in 
the response domain. At this point it seems 
appropriate to examine existing theoretical 
structures for intervening variables or con- 
structs which are influenced by muscular 
tension and which operate directly at the 
etiector level to influence performance. 

It appears to the author that Hull’s con- 
struct of reactive inhibition (5, pp. 277-299) 
might fit these specifications. For Hull, 
reactive inhibition is generated by virtue of 
response evocation and has the function of a 
negative motivational state. This state is 
assumed to dissipate with the passage of time, 
and is viewed as operating primarily at the 
effector level. The magnitude of this in- 
hibitory potential is considered to be directly 
dependent on the amount of work involved in 
the response under consideration. In order 
to assume that work-surface height influences 
performance through its action on reactive 
inhibition, it is necessary to make the assump- 
tion that the amount of work involved in a 
response is a positive function of the tension of 
the muscle systems of the organism. Such 
theorizing would lead to the view that work- 
surface height influences muscular tension 
which in turn influences the amount of reactive 
inhibition generated by response evocation. 
Reactive inhibition would then operate mor 
or less directly at the effector level to inhibit 
block-turning responses. 

Application of this construct to the work- 
surface height variable yields the deduction 
that greater reactive inhibition should be 
associated with performance at high work- 
surface heights than with performance at 
moderate heights, since muscular tension is 
greater at these high heights (see Figure 3). 
This deduction lends itself readily to experi- 
mental verification. Since reactive inhibition 
dissipates with time, one would expect greater 
post-rest reminiscence at high heights than at 
some intermediate work-surface height such 
as Height 4. 

Experimental verification of this deduction 
would be of considerable interest because it 
would provide an indication of the applicability 
of general constructs developed in the field of 
learning to the problems of equipment design. 
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The development of such general constructs 
is basic to the erection of a psychological 
theory of equipment design which would 
permit prediction of the effects of equipment 
design variables without resorting to the 
investigation of each specific equipment design 
problem. 
Summary 


An experiment was performed to determine 
the relationship between work-surface height 
and speed of manipulative performance of a 
simple block-turning task. Ss were 48 male 
college students who performed the task at 
six levels of work-surface height, ranging from 
a minimum average height of 25.9 in. to a 
maximum average height of 52.7 in. A 
Latin-square experimental design was used, 
in which each S worked at each work-surface 
height for a three-minute trial. Analysis of 
the data yielded the following results: 

1. Statistically significant variations in 
speed of manipulative performance were 
associated with variations in work-surface 
height. Maximum performance occurred at 
an average height of 42.0 in., which corresponds 
to a setting approximately 3 in. below the 
elbow. Significantly slower performance oc- 
curred at higher work-surface heights than at 
lower heights. 

2. Significant variations in feelings and 
locus of muscular strain are associated with 
variations in work-surface height. 

An interpretation of the obtained results is 
offered on the assumption that muscular 


tension is the variable intervening between 
work-surface height and performance. The 
applicability of the construct of reactive 
inhibition to the data is also noted, and an 
experiment is suggested to test an hypothesis 
stemming from the use of this construct. 
While the results are considered as having only 
minor immediate practical value for problems 
of industrial equipment design, they are 
viewed as indicating that work-surface height 
is an equipment design variable which is 
worthy of further investigation. 


Received October 5, 1950. 
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Derber, Milton (editor). The aged and society, 
Champaign, Illinois: Industrial Relations 
Research Association, 1950. Pp. 237. 
$3.00. 


This volume of fifteen papers by as many 
authors is one of several collections dealing 
with the aged which have appeared recently, 
and attest the mounting interest in this 
total field. The chapters are grouped in 
three parts. Part I, “The New Age Distribu- 
tion in the New Society,” includes papers by 
Shryock of the United States Bureau of the 
Census on “The Changing Age Profile of the 
Population”; Moore of the Princeton Office of 
Population Research on “The Aged in Industrial 
Societies”; and T. Lynn Smith of the Univer- 
sity of Florida on “The Aged in Rural Society.” 

Part II on “Older Workers and Social 
Patterns” starts with papers by Otto Pollak 
of the University of Pennsylvania on ‘The 
Older Worker in the Labor Market,” and 
“The Role of Industry in Relation to the 


Older Worker” by J. Douglas Brown of 


Princeton. Next is a statement of “Union 
Policies and the Older Worker” by Solomon 
Barkin, Director of Research of the Textile 
Workers Union. “Self-Provision for the Aged” 
by Moore of the University of Oregon deals 
primarily with retirement of professional 
people. Slichter of Harvard has a _hard- 
hitting discussion of “Retirement Age and 
Social Policy,” stressing economic loss resulting 
from early arbitrary retirement. Witte of 
the University of Wisconsin summarizes 
“Social Provisions for the Aged’”—pensions, 
old age assistance, etc. 

Part III is entitled “Research—Present and 
Prospective.”” A chapter by Ernest W. Burgess 
of the University of Chicago reviews studies 
on personal and social adjustment in old age, 
with special reference to work on these 
problems at that institution. Lloyd H. Fisher 
of the University of California has a broad 
consideration of “The Politics of Age.” Shock 
of the National Institute of Health summarizes 
research in this country on the psychology 
of the aging, while Welford and Speakman of 
the Nuffield Research Unit into Problems of 


Aging at Cambridge review English work on 
the employability of older people. Oscar 
Kaplan of San Diego State College has a 


- broad appraisal of “The Mental Health of 


Older Workers.” The final very human 
paper is by another Englishman, Dr. J. H. 
Sheldon of Wolverhampton, on ‘“Medical- 
Social Aspects of the Aging Process.”’ 

Evidently, the range of tropics is wide; and 
the contributors are indeed diverse as regards 
specialty (physician, physiologist, psychologist, 
sociologist, economist), geographical locations, 
and orientations in regard to the subject. 
Nevertheless, there is a broad unity, and an 
unusually high excellence of treatment is 
maintained throughout. Perhaps the over- 
all message of the volume might be compacted 
into three statements: the increases in propor- 
tion of the aged in this country (combined 
with its urbanization and industrialization) 
have indeed created a new set of socio-economic 
problems; at present there is much groping 
about in regard to these problems; research is 
beginning to show ways in which these 
problems can be met, to make old people 
increasingly useful and valued members of 
the community. 


S. L. Pressey 
Ohio State University 


Cleeton, Glen U. Making work human. 
Yellow Springs: Antioch Press, 1949. Pp. 
326. $3.75. 

The primary thesis of this book is that 
work can be made fully as satisfying as 
leisure-time activities. The author contends 
that work can be directed toward the objectives 
of production and service and still be made to 
conform to the basic desires and capacities 
of man. The book includes fundamental 
principles of human nature and their applica- 
tion to the solution of problems of work 
adjustment. As a basis for conclusions pre- 
sented in other chapters, Cleeton uses the 
following classification of needs and wants of 
man: food, air, and moisture; bodily well 
being; activity; mating; sharing thoughts and 
feelings; dominance; _ self-determination; 
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achievement; approbation; ideation. In the 
opinion of this reviewer, a reader’s evaluation 
of the book is apt to be highly correlated with 
his view of the adequacy of the above classifica- 
tion. 

The author has made numerous frontal 


attacks on viewpoints and behavior of leaders . 


in industry, government, and labor. He is 
to be commended for his excellent balance, 
although it is not apt to increase the popularity 
of the book with prejudiced readers. 

Although the. bibliography lists 155 items, 
some well-known works are surprisingly 
omitted. For example: the author recom- 
mends Kaplan’s “Encyclopedia of Vocational 
Guidance” but makes no reference to Buros’ 
“Mental Measurements Yearbook” as a source 
of information about tests. The author is 
not always cautious in his generalizations. 
Statements following phrases such as “It is 
known ...” may cause embarrassment to 
instructors using the book as a text. State- 
ments preceded by “It has been estimated 
that . . .” will cause some readers to ask 
“By whom?” and “On what basis?” 

An excellent appendix includes topics (ar- 
ranged by: chapters of the book) for further 
study or for practice in application of prin- 
ciples; a bibliography of 155 items (from 
Collier’s, Life, AAF Aviation Psychology 
Research Reports, standard psychology texts, 
etc.); index of organizations mentioned in the 
book; and subject index. 

The vocabulary and sentence structure 
will frequently be stumbling blocks to persons 
to whom the book is directed, i.e., “representa- 
tives of management who are responsible for 
the work of others, and for those workers who 
are interested in the problem of work adjust- 
ment.” The book appears likely to be more 
useful in schools of technology as collateral 
reading to serve as an antidote to the ex- 
clusively mechanistic approach typical of so 
many such students. 


C. E. Jurgensen 
Minneapolis Gas Company 


Dooher, M. Joseph, and Marquis, Vivienne. 
The AMA handbook of wage and salary ad- 


ministration. American Management Asso- 
ciation, New York, 1950. Pp. 412. $7.50. 
Mr. Dooher and Miss Marquis have at- 
tempted to give management guidance by 
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collating a number of past papers that have 
been prepared for, or delivered at, various 
American Management Association confer- 
ences in the past. They have attempted to 
arrange the papers in some logical sequence. 
However, they hang together quite loosely. 
It is amusing to see a book published in this 
day and age on wage and salary administration 
that is full of numerical techniques and 
hortatory platitudes but has only one reference 
to the most important factor that influences 
wage payments today—collective bargaining. 
There is just one paper on office salary admin- 
istration under collective bargaining—an area 
of relatively small importance in setting 
national wage policies and trends. 

If the editors had omitted Parts I and II, 
“Basic Principles and Approaches in Wage 
and Salary Administration” and “Conducting 
a Wage Survey,” we would at least have had 
an adequate handbook on job evaluation. 
However, we must be guided by the title and 
prospectus as outlined by the editors of the 
book, rather than what we think a good 
revision would have been. 

The very first paper, called “Some Basic 
Principles of Wage and Salary Administra- 
tion,” by Herbert Fuhrman, sets down this 
basic principle, “The need for scientific wage 
determination in every enterprise... is a 
well established fact (p. 11). . . . The com- 
pensation plan should be based upon prevailing 
locality wage rates (p. 12).” 

This is utter nonsense on two counts. First 
of all, there is no such thing as scientific 
wage determination. Wages come as the 
result of an equity concept rather than any 
objective determination. It is a bargaining 
concept which establishes a norm on the 
basis of the relative power and economic 
requirements of the management and the 
working force. Second, what is so scientific 
about prevailing locality wage rates? Suppose 
I, as a union organizer, have organized a plant 
in a small southern community which has 
hitherto served as a refuge for factories 
attempting to escape unionization. The pre- 
vailing rate in this community is far below 
payments for the same work elsewhere. Why 
should I even respect this community rate 
set by the marginal chiseler? After all, 
what is the so-called going rate in a community 


. except the lowest rate with which an employer 
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has been able to get by before organization 
of his employees? Or take another case. A 
giant corporation manufactures automobiles 
in Dallas and Detroit. Let us assume the 
prevailing rates in other Detroit industries 
for machinists are at least twice the rate in 
Dallas. However, the same automobile com- 
mands the same price in New York City 
whether it was manufactured in Dallas or 
Detroit. Why should a union negotiator be 
guided by the so-called low going rate for 
machinists in Dallas when he attempts to 
negotiate a rate for machinists in the Dallas 
automobile factories. 

Kress informs us in a similar paper that 
“There are four fundamentals to a sound 
wage payment policy: 1) the adoption of a 
fair wage policy, 2) the establishment of fair 
wage differentials, 3) the setting of proper 
standards of performance” et al. This is 
typical of the peculiar tautologies with which 
the book is replete. These expressions are 


pious platitudes; they lend no insight to 
operational procedures. 

The papers on job evaluation techniques are 
quite detailed on how to carry out the clerical 


operations required. There is a minimum of 
critical treatment of the techniques that 
point up the technical limitations of the 
operational procedures. The only paper of a 
critical nature is one by Professor Lawshe of 
Purdue University. 

With all its faults, however, this is a must 
book in the library of every trade unionist and 
industrial relations executive, if for no other 
reason than that prevailing ideas influence 
the behavior of key individuals even when 
those ideas make no sense. 


William Gomberg 
Management Engineering Department, 
International Ladies’ Garment Workers’ Union, 
New York, New York 


Hahn, Milton E., and MacLean, Malcolm S. 
General clinical counseling in educational 
institutions. New York: McGraw-Hill, 
1950. Pp. xi+ 375. $3.50. 

This book suggests a job title for an area of 
practice, and defines that area systematically. 
It attempts to clarify the semantic fog in 
which such terms as counseling, guidance, 
psychotherapy, and “personnel work” are 
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adrift and such will-o’-the-wisp modifiers as 
vocational, educational, clinical, and “per- 
sonal” confuse the beginner and harass the 
professional. The vagueness of these terms 
continues to thwart the accurate specification 
of training, qualifications, and ethical practices. 
Hahn and MacLean have not only attempted 
to set down these specifications, but to clarify 
the skills, knowledge, and competencies re- 
quired of practitioners. In so doing they 
have distilled much thinking and reading and 
clinical experience into a book which provides 
excellent professional orientation, brief but 
adequate instruction in many tools and 
techniques, but somewhat less than a compre- 
hensive understanding of the psychology of 
counseling. 

As the authors see it, general clinical counsel- 
ing in educational institutions comprises such 
a breadth of diagnostic and therapeutic skills 
that training for the work must blend that of 
the psychotherapist, the vocational counselor, 
the educational adviser. The clinical coun- 

lor is a broad specialist who utilizes the 

iagnostic tools of all three and “applies 
therapy of significant depth in those problem 
areas in which his major interests and com- 
petencies lie.” He deals with many more 
variables than such specialized psychologists 
as “Rogerian” counselors and tends therefore 
to have his “professional foundations deeply 
sunk in differential psychology.” In discuss- 
ing the competencies involved the authors 
mark out no new ground, but emphasize the 
common ground of the various specialties. 
Occupational information, case study tech- 
niques, and projective techniques are thus 
given equivalent space. The discussion of 
problems and techniques contributes to the 
clarification of many concepts, notably the 
term “prediction” in relation to counseling. 
The authors put their finger on some misleading 
current catch phrases, e.g., “blind alley jobs,” 
“social intelligence,” “group counseling,” and 
they vigorously dispose of some fallacies 
which counselor-trainers know to be more 
durable than straw. A brief consideration of 
the application of semantics to counseling is a 
helpful feature, although the reference to 
semantics as a “tool” is confusing. It is 
striking, however, that a good part of the 
text is devoted to diagnostic skills, while 
therapeutic skills are given honorable mention 
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but practically no development. The picture 
of counseling which emerges is the eclectic- 
diagnostic one of the Williamson tradition, 
in which diagnosis is taken as the keynote of 
counseling skill. The manner of emphasis 
tends to infer that counselor insight is con- 
sidered to be somehow therapeutic for the 
client, an inference which many counselors 
(including perhaps the authors) would reject. 
In any case, the reader’s impression is that” 
therapeutic skills are neither tangible nor 
teachable, while diagnostic skills are. The 
picture of the counselor as therapist simply 
does not emerge. 

The book would convince the beginner that 
there is a lot to learn. For the advanced 
student, it seems short in basic theoretical 
orientation to counseling. For example, the 
Rogerian approach is characterized as a set 
of special methods (without describing them) 
and the more fundamental contribution of 
non-directive theory is not discussed. This 
would seem to be a major shortcoming in 
view of the inclusive title of the book. Sim- 
ilarly, the psychological treatment of inter- 
viewing seems sketchy by comparison with 
the “rules for orientation” proposed by 
Roethlisberger and Dickson years ago. The 
chapter on interests, although expertly calling 
attention to the practical difficulties of 
interpreting interest inventories, seems less 
theoretically oriented than it might have been. 

In a sense, the major contribution of the 
book is its title. This is not faint praise. 
The term general clinical counseling represents 
an area of training and practice which already 
has wide recognition, but for which a widely 
accepted name has been lacking. It is to 
be hoped that the job title will enjoy the vogue 
it deserves, that its use will further the destruc- 
tien of artificial compartments of training, 
and result in the effective cross-fertilization 
of clinical psychology, guidance, and ‘“Roger- 
ianism.” 

B ‘ 

University of Florida a 


Brayfield, Arthur H. (Ed.). 
modern methods of counseling. New York: 
Appleton-Century-Crofts, Inc., 1950. Pp. 
526. $5.00. 

The purpose of this valuable source book is 
well-stated in its preface “to present a fairly sys- 
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tematic account of modern counseling theory, 
practice and research as found in recent peri- 
odical literature” (p. v). Classroom teacher, 
student and practitioner alike will be grateful 
for the compilation of original papers, hitherto 
scattered among elusive books and dusty 
back volumes of periodicals. Moreover, the 
readings are enriched and integrated by the 
author’s comments in his preface and in his 
introductory remarks preceding each block 
of readings. The five major divisions of the 
book deal with clinical method, diagnosis, 
treatment, interviewing, and evaluation. The 
author’s opening section on “counseling in 
transition” points up a general trend toward 
professionalization and more rigorous training 
standards in the field of counseling, along with 
a “linking of counseling and psychotherapy.” 
In all, a healthy balance is maintained in the 
provision of readings distributed among the 
“educational, vocational and personal” aspects 
of counseling. 

Through no fault of its editor, Modern 
Methods of Counseling is deficient in one 
respect. This is in the conspicuous absence 
of any basic, integrative theory of counseling. 
Research evidence in the field of counseling 
appears to be disappointingly far removed from 
providing system and clarity in counseling 
method. Hard-headed research is provided 
by the “Minnesota” group, but much of this 
is piece-meal. Systematic thinking and sup- 
portive research are furnished by the non- 
directive group, but claims for the system 
appear premature because the evidence is 
incomplete and equivocal. Other systematic 
contributions in the book are only speculative. 

Nevertheless, this book is a “must” for the 
library of every counselor. It presents a 
varied and representative sample of writings 
about counseling. It provides opportunity for 
thorough-going review of original sources, 
and helps to prevent the naive, often ill- 
founded acceptance of watered-down con- 
clusions perpetuated in text-books. 

A convenient gage of recent developments 
in counseling, illustrated by these collected 
papers, is provided in Paterson’s stimulating 
chapter on “the genesis of modern guidance” 
(pp. 13-26) which was written nearly fifteen 
years ago. His predictions concerning the 
rapid expansion of counseling and testing 
services have been verified; his plea for a 
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research efforts has been 


Harold B. Pepinsky 
The Ohio State University 


Farnsworth, Paul R. Musical 
measurement and cultural nature. Stanford, 
California: Stanford University Press, 
University Series, Education-Psychology, 
Vol. II, No. 1, 1950. Pp. 94. $1.50. 


This book begins with a chapter in which 
the general problem of taste in music is 
stated and the more specific and controversial 
questions are identified. This is followed in 
six further chapters with critical summaries 
of the available data on the laws of taste, the 
formal measurement of taste, polling and its 
problems, operational procedures, conditioners 
of taste and studies in the evaluation of 
eminent composers. In a final chapter, the 
significant conclusions from this wealth of 
data are presented. The book therefore 
begins where the usual history and exposition 
of aesthetics ends, although in carrying the 
story and the problems forward into the realm 
of empirical data, it may turn a corner which 
many older workers and writers will find a 
little disconcerting. 

The style and organization are simple and 
informal and will therefore be especially 
welcome to students of psychology as well as 
to the many readers in the field of music where 
statistical complexities and methodological 
intricacies are inappropriate. It should serve 
as an indispensable handbook for all workers 
and experimenters in the field of musical 
taste. 

This enlightening and stimulating review of 
music research by one of its most sophisticated 
scholars has two most essential virtues: it is 
clear and complete, and it is never tedious. 
These virtues may be due in part to the fact 
that research in musical taste is still in its 
infancy, characterized more by breadth and 
boldness than by subtle semantic differentia- 
tions and esoteric techniques which are 
inevitable with the rich development of 
materials in other fields. They are also due in 
part to the fact that the author has dealt ex- 
clusively with experimental work, not with 
speculation and argument unsupported by 
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empirical data. And they are certainly due 
also in large part to the author’s skill in 
organizing the available studies around signif- 
icant problems, stripping the problems clear 
of metaphysical nonessentials and resetting 
them in a new experimental frame of reference 
no matter how spare and unglamorous the 
new setting may be. To those who may be 
interested in another dimension of musical 
taste, especially if they approach it from a 
more subjective and intuitive point of view, 
these chapters may seem unconvincing, for 
the methods, the vocabulary, the concepts, are 
those of the social scientist and the author 
makes no attempt to meet other disciplines 
on their own grounds. He merely gives a 
clear and full picture of the ground on his 
own side of the pasture. 

In summarizing his views, Mr. Farnsworth 
says (p. 83), “To answer questions about the 
nature of the vibrato, the timbre of a much- * 
appreciated violin tone, or the precision with ~ 
which one can sing or play a desired pitch, © 
the methods of the laboratory must be em- — 
ployed. But whenever the questions concern ~ 
preferences or attitudes, the tools of the © 
social scientist will be found more applicable. © 
Since the problems with which we have dealt 
in this book are of the latter sort, we have 
called on the methods of the social scientists 
rather than on those of the laboratory man. ~ 
In some degree our tools have been found to ~ 
be unreliable and invalid. We believe however 
that they are good enough to have uncovered 
‘facts’ which are worth the careful consideration 
of musicologists, musical educators, and social 
scientists in general.” And again (p. 3), 
“musical taste is a phenomenon of the social 
sciences, rather than a conglomeration of 
chance responses or a set of absolutes. Like 
all other folkways, musical taste is peculiar 
to a particular group of people, a particular 
place, and a particular period of history. 
No music then can be inherently good or bad, 
for its goodness is only an evaluation by a 
group of men trained to accept a particular 
set of standards.” 

Psychologists who prefer to remain uncon- 
vinced had better not read this book. 


Kate Hevner Mueller 
Indiana University 
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Cantril, Hadley. Public opinion, 1935-1946 
Princeton: Princeton University Press, 1951. 
Pp. lix + 1191. $25.00. 


This is a monumental compendium of 
opinion poll results for a twelve year period. 
It is a product of the Office of Public Opinion 
Research at Princeton prepared by Mrs. 
Mildred Strunk under Cantril’s editorial 
direction. It includes results of polls con- 
ducted by 23 organizations in 16 countries. 
It compresses into a single volume for a twelve 
year period the kind of material presented 
quarterly by Public Opinion Quarterly and the 
International Journal of Opinion and Attitude 
Research. 

The work, however, is far more extensive 
than merely compiling what has been published. 
It involved continuous contact with the 23 
organizations with reference to technical 
matters of sampling and translations. An 
even greater task was the development of a 
suitable scheme of classification and wording 
of both subjects and cross references. This 
was accomplished by adhering to Library of 
Congress subject headings. 

The value of such a compilation is largely 
dependent upon the speed and certainty with 
which a reader may locate the results of any 
poll on any given subject. This seems to have 
been achieved admirably in the 45 page table 
of contents extending alphabetically from 
“‘A.A.A. See Agricultural Adjustment Act” to 
“Yugoslavia. . . . See World War, 1939-1945: 
Territorial questions (Yugoslavia).”” Thus it 
is apparent that it covers almost “every 
subject under the sun.” 

The applied psychologist will find it to be 
an invaluable reference source to what people 
report they believe on important issues in the 
field of industry, agriculture, education, the 
‘church, political parties and elections, employ- 
ment and unemployment, crime, etc. For 
example, what opinions do people report on 
“sex differences in ability” or on “the employ- 
ment of women”? Or, what about people’s 
opinions about issues in the field of advertising 
or in the field of industrial relations? These 
and many others are recorded cumulatively by 
country and by date. 

The general rule for inclusion was to report 
only results and questions based on a national 
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cross-section. This brings together the work 
of such persons and agencies as Gallup, 
Crossley, Roper, the National Opinion Re- 
search Center, and Gallup Foreign Affiliates. 
One wonders why the survey results obtained 
by Link and the Psychologica) Corporation or 
why the results obtained by Kornhauser’s Poll 
of Experts were omitted. 

Cantril in the Preface states on p. v, “The 
present volume covers surveys from their 
beginning in 1935... .” The statement is 
historically inaccurate since fifteen psychol- 
ogists, cooperating with the Psychological 
Corporation, made their first nation-wide urban 
survey in March, 1932, thus launching what 
has come to be known as the Psychological 
Barometer. This is probably the earliest 
continuous public opinion poll made entirely 
with personal interviews. Applied psychol- 
ogists, in particular, are, for this reason, likely 
to resent the dating of everything in public 
opinion polling from the 1935 reports of 
Roper and Gallup. 

Controversy over a slight historical inac- 
curacy or regret that the results obtained by 
Link or by Kornhauser were omitted, should 
not blind one to the importance of this mon- 
umental work. © If this contribution is followed 
up periodically, say at five year intervals, 
social science, both pure and applied, will 
profit enormously from these publications. 
They will provide an indispensable reservoir 
of raw data for the understanding and analysis 
of trends in public opinion on particular 
issues and subjects. 


Donald G. Paterson 
University of Minnesota 


Lucas, D. B., and Britt, S. H. Advertising 


psychology and research. An introductory 
book. New York: McGraw-Hill Book Co., 
1950. Pp. xi+ 765. $6.50. 


Even the most critical reader will have 
difficulty finding anything seriously wrong 
with this book, provided that he understands 
its purpose. It is not intended as an advanced 
treatment of the subject. The subtitle makes 
this perfectly clear. Furthermore, it is exten- 
sive rather than intensive. As the preface 
points out, a book could be written on the 
problems covered by almost every chapter. 
It does, however, go into sufficient detail to 
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provide a very practical survey of this partic- 
ular field. 

It does not attempt to cover the whole field 
of advertising, marketing research, or psy- 
chology. It covers those parts of all of these 
fields that are important as far as the psy- 
chological aspects of advertising are concerned. 
As the title suggests, both psychological 
principles of advertising and research methods 
used in the field of advertising are included. 
The coverage is broad and the organization 
is logical. 

The objectives of advertising are defined as 
capturing attention; arousing and _ holding 
interest; and making a useful, lasting impres- 
sion. This leads into a discussion of such 
topics as attention, interest, association, and 
memory. 

The problem of how to attain these objec- 
tives is discussed first from the standpoint of 
advertising appeals including a consideration 
of both general motivation and the use of 
consumer surveys. Then it is approached 
from the standpoint of the techniques of 
presenting the appeals and finally from the 
standpoint of such mechanical factors as 
location, size, layout, color, typography, and 
the structure of radio and television programs. 
Thirteen chapters are devoted to the objectives 
and how to attain them. é 

The remaining ten chapters cover tests of 
advertising and the evaluation of media 
audiences. Then there is a section which 
presents questions and exercises for each 
chapter and a glossary to help readers who are 
not familiar with the terminology of this 
particular field. Reading references at the 
end of the chapters also are provided. 

Perhaps the most outstanding characteristic 
of this book is the excellent blending of 
academic soundness and practical experience 
The academic soundness shows up in a variety 
of ways: familiarity with previous research, 
a systematic approach to each problem, an 
understanding of research methods, and a 
thorough and unbiased analysis of both 
results and methods. Obviously this is es- 
sential in writing a book in this field. How- 
ever, academic soundness alone cannot account 
for the high level attained. Practical ex- 
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perience must be given credit for the practical 
viewpoint, the down-to-earth approach, and © 
the realistic attitude displayed both in what 
is reported and the way in which it is presented. 

Even the way that such a minor point as 
the stabilization chart is handled demonstrates 
the blending of the academic and the practical. 
Inclusion of a discussion of the stabilization 
chart shows a recognition of the problem that it 
has been used by some practitioners. Realiza- 
tion of its usefulness as an illustrative device 
provides a further indication of a practical 
viewpoint. Then academic soundness enters 
the discussion to point out its weaknesses and 
to recommend the use of formulas for comput- 
ing sampling errors instead of relying upon 
stabilization charts. 

In an introductory book, there always is 
the problem of staying on a relatively elemen- 
tary level without excluding material that is 
essential to a practical understanding of the 
field. The authors have worked out a satis- 
factory solution by avoiding too much over- 
simplification but still covering each topic in 
a manner that should be easy for most 
beginners to follow. The explanation of 
Hooperatings provides one example. The 
formula is not given, but there is a good 
discussion of the logic underlying the formula. 
Even the problems involved in the treatment 
of busy signals are mentioned. Similarly, 
no attempt is made to cover the technical 
aspects of sampling, but a logical discussion of 
probability sampling is included. 

In a field in which almost every one has a 
favorite method or methods, really unbiased 
evaluations of tests of advertising might be 
expected to be relatively rare. However, 
there are not any indications of bias in this 
book. If one were looking for bias, he surely 
would be most likely to find it in the discussion 
of a method with which one of the authors 
has been directly associated. It just is not 
there. The controlled-recognition method is 
evaluated in as unbiased a manner, including 
its limitations, as any tough minded impartial 
observer could want. 

Alfred C. Welch 


Knox Reeves Advertising, Inc., 
Minneapolis 
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Revised edition. W. C. Allee. New York: Henry 
Schuman, Inc., 1951. Pp. 233. $3.50. 
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ton: Princeton University Press, 1951. Pp. 249. 
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